trainModel: Model training and performance using cross-validation

Description Usage Arguments Details Value References See Also Examples

Description

Train a model and estimate the model performance using multiple resamplings each devided into training and independent subsets. Training subsets are further divided into k-fold cross-validation samples for model tuing. Testing sampels are used for the independent validation of the final model. This procedure is repeated for each resampling provided.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## S4 method for signature 'GPM'
trainModel(x, n_var = NULL, mthd = "rf", mode = c("rfe",
  "ffs"), seed_nbr = 11, cv_nbr = NULL, var_selection = c("sd", "indv"),
  metric = NULL, tune_length = NULL, response_nbr = NULL,
  resample_nbr = NULL, filepath_tmp = NULL, rerank = FALSE, ...)

## S4 method for signature 'data.frame'
trainModel(x, response, predictor, selector, meta,
  resamples, n_var = NULL, mthd = "rf", mode = c("rfe", "ffs"),
  seed_nbr = 11, cv_nbr = 2, var_selection = c("sd", "indv"),
  metric = NULL, tune_length = NULL, response_nbr = NULL,
  resample_nbr = NULL, filepath_tmp = NULL, rerank = FALSE, ...)

Arguments

x

An object of class gpm or data.frame

n_var

Vector holding the number of variables used for the recursive feature elimination iterations; must not be continous (e.g. c(1:10, 20, 30))

mthd

Core method used for the model (e.g. "rf" for random forest)

mode

Variable selection mode, either recursive feature elimination ("rfe") or forward feature selection ("ffs)

seed_nbr

Specific seed to be to ensure reproducability

cv_nbr

Specific cross validation folds to be used for model tuning within each forward or backward feature selection/elimination step

var_selection

Select final number of variables based on a standard deviation statistic ("sd", more conservative) or by the actual best number ("indv")

metric

The metric to be used to compute the model performance.

tune_length

Tune length to be used in recursive feature elimination (if NULL, the fixed default grid taken from the GPM LUT will be used).

response_nbr

Response ID to be computed; only relevant if more than one response variable is present and a model should not be built for each of them

resample_nbr

Resample ID to be computed; only relevant if the model training should not run over all resamples

filepath_tmp

If set, intermediate model results during the variable selection are writen to disc; if the procedure stops for some reason, the already computed results can be read in again which saves computation time (e.g. after an accidential shutdown etc.)

...

Additional arguments passed to trainModelffs.

response

The column name(s) of the response variable(s)

predictor

The column ID of the predictor, i.e. independent variable(s) in the dataset

selector

Selector id

meta

Meta information of the gpm object

resamples

The list of the resamples containing the individual row numbers (resulting from function resamplingsByVariable)

Details

The backfard feature selection is based on the implementation of the caret::rfe function. The forward feature selection is implemented from scratch. The latter stops if the error statistics get worse after a first optimum is reached. For model training, the respective caret functions are used, too.

Value

NONE

A layer within the gpm object with the model training information for each response variable and all resamplings.

Trained model for each response variable and all resamplings.

References

The function uses functions from: Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang and Can Candan. (2016). caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret

See Also

NONE

Examples

1
2
3
4
## Not run: 
#Not run

## End(Not run)

environmentalinformatics-marburg/gpm documentation built on July 11, 2020, 11:12 a.m.