Description Usage Arguments Details Value Note Author(s) References See Also Examples
Most classifiers implemented in this package depend on one
or even several hyperparameters (s. details) that should be optimized
to obtain good (and comparable !) results. As tuning scheme, we propose
three fold Cross-Validation on each learningset
(for fixed selected
variables). Note that learningsets
usually do not contain the
complete dataset, so tuning involves a second level of splitting the dataset.
Increasing the number of folds leads to larger datasets (and possibly to higher accuracy),
but also to higher computing times.
For S4 method information, s. link{tune-methods}
1 |
X |
Gene expression data. Can be one of the following:
|
y |
Class labels. Can be one of the following:
|
f |
A two-sided formula, if |
learningsets |
An object of class |
genesel |
Optional (but usually recommended) object of class
|
genesellist |
In the case that the argument |
nbgene |
Number of best genes to be kept for classification, based
on either
|
classifier |
Name of function ending with |
fold |
The number of cross-validation folds used within each |
strat |
Should stratified cross-validation according to the class proportions
in the complete dataset be used ? Default is |
grids |
A named list. The names correspond to the arguments to be tuned,
e.g. |
trace |
Should progress be traced ? Default is |
... |
Further arguments to be passed to |
The following default settings are used, if the arguments grids
is an empty list:
gbmCMA
n.trees = c(50, 100, 200, 500, 1000)
compBoostCMA
mstop = c(50, 100, 200, 500, 1000)
LassoCMA
norm.fraction = seq(from=0.1, to=0.9, length=9)
ElasticNetCMA
norm.fraction = seq(from=0.1, to=0.9, length=5), alpha = 2^{-(5:1)}
plrCMA
lambda = 2^{-4:4}
pls_ldaCMA
comp = 1:10
pls_lrCMA
comp = 1:10
pls_rfCMA
comp = 1:10
rfCMA
mtry = ceiling(c(0.1, 0.25, 0.5, 1, 2)*sqrt(ncol(X))), nodesize = c(1,2,3)
knnCMA
k=1:10
pknnCMA
k = 1:10
scdaCMA
delta = c(0.1, 0.25, 0.5, 1, 2, 5)
pnnCMA
sigma = c(2^{-2:2})
,
nnetCMA
size = 1:5, decay = c(0, 2^{-(4:1)})
svmCMA
, kernel = "linear"
cost = c(0.1, 1, 5, 10, 50, 100, 500)
svmCMA
, kernel = "radial"
cost = c(0.1, 1, 5, 10, 50, 100, 500), gamma = 2^{-2:2}
svmCMA
, kernel = "polynomial"
cost = c(0.1, 1, 5, 10, 50, 100, 500), degree = 2:4
An object of class tuningresult
The computation time can be enormously high. Note that for each different
learningset
, the classifier must be trained fold
times
number of possible different hyperparameter combinations
times.
E.g. if the number of the learningsets is fifty, fold = 3
and
two hyperparameters (each with 5 candidate values) are tuned, 50x3x25=3750
training iterations are necessary !
Martin Slawski ms@cs.uni-sb.de
Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de
Christoph Bernau bernau@ibe.med.uni-muenchen.de
Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439
tuningresult
, GeneSelection
, classification
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## Not run:
### simple example for a one-dimensional grid, using compBoostCMA.
### dataset
data(golub)
golubY <- golub[,1]
golubX <- as.matrix(golub[,-1])
### learningsets
set.seed(111)
lset <- GenerateLearningsets(y=golubY, method = "CV", fold=5, strat =TRUE)
### tuning after gene selection with the t.test
tuneres <- tune(X = golubX, y = golubY, learningsets = lset,
genesellist = list(method = "t.test"),
classifier=compBoostCMA, nbgene = 100,
grids = list(mstop = c(50, 100, 250, 500, 1000)))
### inspect results
show(tuneres)
best(tuneres)
plot(tuneres, iter = 3)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.