train.model: Model training

Description Usage Arguments Details Value Examples

View source: R/train_model.r

Description

This function trains the a machine learning model on the training data

Usage

1
2
3
4
5
train.model(siamcat, method = c("lasso", "enet", "ridge", "lasso_ll",
"ridge_ll", "randomForest"), stratify = TRUE, modsel.crit = list("auc"),
min.nonzero.coeff = 1, param.set = NULL, perform.fs = FALSE, param.fs =
list(thres.fs = 100, method.fs = "AUC", direction='absolute'),
feature.type='normalized', verbose = 1)

Arguments

siamcat

object of class siamcat-class

method

string, specifies the type of model to be trained, may be one of these: c('lasso', 'enet', 'ridge', 'lasso_ll', 'ridge_ll', 'randomForest')

stratify

boolean, should the folds in the internal cross-validation be stratified?, defaults to TRUE

modsel.crit

list, specifies the model selection criterion during internal cross-validation, may contain these: c('auc', 'f1', 'acc', 'pr'), defaults to list('auc')

min.nonzero.coeff

integer number of minimum nonzero coefficients that should be present in the model (only for 'lasso', 'ridge', and 'enet'), defaults to 1

param.set

list, set of extra parameters for mlr run, may contain:

  • cost and class.weights - for lasso_ll and ridge_ll

  • alpha - for enet

  • ntree and mtry - for RandomForrest.

See below for details. Defaults to NULL

perform.fs

boolean, should feature selection be performed? Defaults to FALSE

param.fs

list, parameters for the feature selection, must contain:

  • thres.fs - threshold for the feature selection,

  • method.fs - method for the feature selection, may be AUC, gFC, or Wilcoxon

  • direction - for AUC and gFC, select either the top associated features (independent of the sign of enrichment), the top positively associated featured, or the top negatively associated features, may be absolute, positive, or negative. Will be ignored for Wilcoxon.

See Details for more information. Defaults to list(thres.fs=100, method.fs="AUC", direction='absolute')

feature.type

string, on which type of features should the function work? Can be either "original", "filtered", or "normalized". Please only change this paramter if you know what you are doing!

verbose

integer, control output: 0 for no output at all, 1 for only information about progress and success, 2 for normal level of information and 3 for full debug information, defaults to 1

Details

This functions performs the training of the machine learning model and functions as an interface to the mlr-package.

The function expects a siamcat-class-object with a prepared cross-validation (see create.data.split) in the data_split-slot of the object. It then trains a model for each fold of the datasplit.

For the machine learning methods that require additional hyperparameters (e.g. lasso_ll), the optimal hyperparameters are tuned with the function tuneParams within the mlr-package.

The different machine learning methods are implemented as mlr-tasks:

Hyperparameters You also have additional control over the machine learning procedure by supplying information through the param.set parameter within the function. We encourage you to check out the excellent mlr documentation for more in-depth information.

Here is a short overview which parameters you can supply in which form:

Feature selection The function can also perform feature selection on each individual fold. At the moment, three methods for feature selection are implemented:

For AUC and gFC, feature selection can also be directed, that means that the features will be selected either based on the overall association (absolute - gFC will be converted to absolute values and AUC values below 0.5 will be converted by 1 - AUC), or on associations in a certain direction (positive - positive enrichment as measured by positive values of the gFC or AUC values higher than 0.5 - and reversely for negative).

Value

object of class siamcat-class with added model_list

Examples

1
2
3
4
data(siamcat_example)

# simple working example
siamcat_example <- train.model(siamcat_example, method='lasso')

SIAMCAT documentation built on Nov. 8, 2020, 5:14 p.m.