train.model: Model training
In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes

Description Usage Arguments Details Value Examples

This function trains the a machine learning model on the training data

train.model(siamcat, method = c("lasso", "enet", "ridge", "lasso_ll",
"ridge_ll", "randomForest"), stratify = TRUE, modsel.crit = list("auc"),
min.nonzero.coeff = 1, param.set = NULL, perform.fs = FALSE, param.fs =
list(thres.fs = 100, method.fs = "AUC", direction='absolute'),
feature.type='normalized', verbose = 1)

`siamcat`	object of class siamcat-class
`method`	string, specifies the type of model to be trained, may be one of these: `c('lasso', 'enet', 'ridge', 'lasso_ll', 'ridge_ll', 'randomForest')`
`stratify`	boolean, should the folds in the internal cross-validation be stratified?, defaults to `TRUE`
`modsel.crit`	list, specifies the model selection criterion during internal cross-validation, may contain these: `c('auc', 'f1', 'acc', 'pr')`, defaults to `list('auc')`
`min.nonzero.coeff`	integer number of minimum nonzero coefficients that should be present in the model (only for `'lasso'`, `'ridge'`, and `'enet'`), defaults to `1`
`param.set`	list, set of extra parameters for mlr run, may contain: `cost` and `class.weights` - for lasso_ll and ridge_ll `alpha` - for enet `ntree` and `mtry` - for RandomForrest. See below for details. Defaults to `NULL`
`perform.fs`	boolean, should feature selection be performed? Defaults to `FALSE`
`param.fs`	list, parameters for the feature selection, must contain: `thres.fs` - threshold for the feature selection, `method.fs` - method for the feature selection, may be `AUC`, `gFC`, or `Wilcoxon` `direction` - for `AUC` and `gFC`, select either the top associated features (independent of the sign of enrichment), the top positively associated featured, or the top negatively associated features, may be `absolute`, `positive`, or `negative`. Will be ignored for `Wilcoxon`. See Details for more information. Defaults to `list(thres.fs=100, method.fs="AUC", direction='absolute')`
`feature.type`	string, on which type of features should the function work? Can be either `"original"`, `"filtered"`, or `"normalized"`. Please only change this paramter if you know what you are doing!
`verbose`	integer, control output: `0` for no output at all, `1` for only information about progress and success, `2` for normal level of information and `3` for full debug information, defaults to `1`

This functions performs the training of the machine learning model and functions as an interface to the mlr-package.

The function expects a siamcat-class-object with a prepared cross-validation (see create.data.split) in the data_split-slot of the object. It then trains a model for each fold of the datasplit.

For the machine learning methods that require additional hyperparameters (e.g. lasso_ll), the optimal hyperparameters are tuned with the function tuneParams within the mlr-package.

The different machine learning methods are implemented as mlr-tasks:

'lasso', 'enet', and 'ridge' use the 'classif.cvglmnet' Learner,
'lasso_ll' and 'ridge_ll' use the 'classif.LiblineaRL1LogReg' and the 'classif.LiblineaRL2LogReg' Learners respectively
'randomForest' is implemented via the 'classif.randomForest' Learner.

Hyperparameters You also have additional control over the machine learning procedure by supplying information through the param.set parameter within the function. We encourage you to check out the excellent mlr documentation for more in-depth information.

Here is a short overview which parameters you can supply in which form:

enet The alpha parameter describes the mixture between lasso and ridge penalty and is -per default- determined using internal cross-validation (the default would be equivalent to param.set=list('alpha'=c(0,1))). You can supply either the limits of the hyperparameter exploration (e.g. with limits 0.2 and 0.8: param.set=list('alpha'=c(0.2,0.8))) or you can supply a fixed alpha value as well (param.set=list('alpha'=0.5)).
lasso_ll/ridge_ll You can supply both class.weights and the cost parameter (cost of the constraints violation, see LiblineaR for more info). The default values would be equal to param.set=list('class.weights'=c(5, 1), 'cost'=c(10 ^ seq(-2, 3, length = 6 + 5 + 10)).
randomForest You can supply the two parameters ntree (Number of trees to grow) and mtry (Number of variables randomly sampled as candidates at each split). See also randomForest for more info. The default values correspond to param.set=list('ntree'=c(100, 1000), 'mtry'= c(round(sqrt.mdim / 2), round(sqrt.mdim), round(sqrt.mdim * 2))) with sqrt.mdim=sqrt(nrow(data)).

Feature selection The function can also perform feature selection on each individual fold. At the moment, three methods for feature selection are implemented:

'AUC' - computes the Area Under the Receiver Operating Characteristics Curve for each single feature and selects the top param.fs$thres.fs, e.g. 100 features
'gFC' - computes the generalized Fold Change (see check.associations) for each feature and likewise selects the top param.fs$thres.fs, e.g. 100 features
Wilcoxon - computes the p-Value for each single feature with the Wilcoxon test and selects features with a p-value smaller than param.fs$thres.fs

For AUC and gFC, feature selection can also be directed, that means that the features will be selected either based on the overall association (absolute - gFC will be converted to absolute values and AUC values below 0.5 will be converted by 1 - AUC), or on associations in a certain direction (positive - positive enrichment as measured by positive values of the gFC or AUC values higher than 0.5 - and reversely for negative).

object of class siamcat-class with added model_list

data(siamcat_example)

# simple working example
siamcat_example <- train.model(siamcat_example, method='lasso')

SIAMCAT documentation built on Nov. 8, 2020, 5:14 p.m.

SIAMCAT index

Package overview README.md Holdout Testing with SIAMCAT SIAMCAT input files formats SIAMCAT: Statistical Inference of Associations between Microbial

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SIAMCAT
Statistical Inference of Associations between Microbial Communities And host phenoTypes

train.model: Model training
In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes

Description

Usage

Arguments

Details

Value

Examples

Related to train.model in SIAMCAT...

R Package Documentation

Browse R Packages

We want your feedback!

SIAMCAT Statistical Inference of Associations between Microbial Communities And host phenoTypes

train.model: Model training In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes

Description

Usage

Arguments

Details

Value

Examples

Related to train.model in SIAMCAT...

R Package Documentation

Browse R Packages

We want your feedback!

SIAMCAT
Statistical Inference of Associations between Microbial Communities And host phenoTypes

train.model: Model training
In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes