Description Usage Arguments Details Value Examples
This function trains the a machine learning model on the training data
1 2 3 4 5 | train.model(siamcat, method = c("lasso", "enet", "ridge", "lasso_ll",
"ridge_ll", "randomForest"), stratify = TRUE, modsel.crit = list("auc"),
min.nonzero.coeff = 1, param.set = NULL, perform.fs = FALSE, param.fs =
list(thres.fs = 100, method.fs = "AUC", direction='absolute'),
feature.type='normalized', verbose = 1)
|
siamcat |
object of class siamcat-class |
method |
string, specifies the type of model to be trained, may be one
of these: |
stratify |
boolean, should the folds in the internal cross-validation be
stratified?, defaults to |
modsel.crit |
list, specifies the model selection criterion during
internal cross-validation, may contain these: |
min.nonzero.coeff |
integer number of minimum nonzero coefficients that
should be present in the model (only for |
param.set |
list, set of extra parameters for mlr run, may contain:
See below for details. Defaults to |
perform.fs |
boolean, should feature selection be performed? Defaults to
|
param.fs |
list, parameters for the feature selection, must contain:
See Details for more information. Defaults to
|
feature.type |
string, on which type of features should the function
work? Can be either |
verbose |
integer, control output: |
This functions performs the training of the machine learning model
and functions as an interface to the mlr
-package.
The function expects a siamcat-class-object with a prepared
cross-validation (see create.data.split) in the
data_split
-slot of the object. It then trains a model for each fold
of the datasplit.
For the machine learning methods that require additional hyperparameters
(e.g. lasso_ll
), the optimal hyperparameters are tuned with the
function tuneParams within the mlr
-package.
The different machine learning methods are implemented as mlr-tasks:
'lasso'
, 'enet'
, and 'ridge'
use the
'classif.cvglmnet'
Learner,
'lasso_ll'
and
'ridge_ll'
use the 'classif.LiblineaRL1LogReg'
and the
'classif.LiblineaRL2LogReg'
Learners respectively
'randomForest'
is implemented via the 'classif.randomForest'
Learner.
Hyperparameters You also have additional control over the machine
learning procedure by supplying information through the param.set
parameter within the function. We encourage you to check out the excellent
mlr documentation for more
in-depth information.
Here is a short overview which parameters you can supply in which form:
enet The alpha parameter describes the mixture between
lasso and ridge penalty and is -per default- determined using internal
cross-validation (the default would be equivalent to
param.set=list('alpha'=c(0,1))
). You can supply either the limits of
the hyperparameter exploration (e.g. with limits 0.2 and 0.8:
param.set=list('alpha'=c(0.2,0.8))
) or you can supply a fixed alpha
value as well (param.set=list('alpha'=0.5)
).
lasso_ll/ridge_ll You can supply both class.weights and
the cost parameter (cost of the constraints violation, see
LiblineaR for more info). The default values would be
equal to param.set=list('class.weights'=c(5, 1),
'cost'=c(10 ^ seq(-2, 3, length = 6 + 5 + 10))
.
randomForest You can supply the two parameters ntree
(Number of trees to grow) and mtry (Number of variables randomly
sampled as candidates at each split). See also
randomForest for more info. The default values
correspond to
param.set=list('ntree'=c(100, 1000), 'mtry'=
c(round(sqrt.mdim / 2), round(sqrt.mdim), round(sqrt.mdim * 2)))
with
sqrt.mdim=sqrt(nrow(data))
.
Feature selection The function can also perform feature selection on each individual fold. At the moment, three methods for feature selection are implemented:
'AUC'
- computes the Area Under the
Receiver Operating Characteristics Curve for each single feature and
selects the top param.fs$thres.fs
, e.g. 100 features
'gFC'
- computes the generalized Fold Change (see
check.associations) for each feature and likewise selects the top
param.fs$thres.fs
, e.g. 100 features
Wilcoxon
-
computes the p-Value for each single feature with the Wilcoxon test and
selects features with a p-value smaller than param.fs$thres.fs
For
AUC
and gFC
, feature selection can also be directed, that
means that the features will be selected either based on the overall
association (absolute
- gFC
will be converted to absolute
values and AUC
values below 0.5
will be converted by 1
- AUC
), or on associations in a certain direction (positive
-
positive enrichment as measured by positive values of the gFC
or
AUC
values higher than 0.5
- and reversely for
negative
).
object of class siamcat-class with added model_list
1 2 3 4 | data(siamcat_example)
# simple working example
siamcat_example <- train.model(siamcat_example, method='lasso')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.