Description Usage Arguments Details Value References See Also Examples
Train a model and estimate the model performance using multiple resamplings each devided into training and independent subsets. Training subsets are further divided into k-fold cross-validation samples for model tuing. Testing sampels are used for the independent validation of the final model. This procedure is repeated for each resampling provided.
1 2 3 4 5 6 7 8 9 10 11 12 | ## S4 method for signature 'GPM'
trainModel(x, n_var = NULL, mthd = "rf", mode = c("rfe",
"ffs"), seed_nbr = 11, cv_nbr = NULL, var_selection = c("sd", "indv"),
metric = NULL, tune_length = NULL, response_nbr = NULL,
resample_nbr = NULL, filepath_tmp = NULL, rerank = FALSE, ...)
## S4 method for signature 'data.frame'
trainModel(x, response, predictor, selector, meta,
resamples, n_var = NULL, mthd = "rf", mode = c("rfe", "ffs"),
seed_nbr = 11, cv_nbr = 2, var_selection = c("sd", "indv"),
metric = NULL, tune_length = NULL, response_nbr = NULL,
resample_nbr = NULL, filepath_tmp = NULL, rerank = FALSE, ...)
|
x |
An object of class gpm or data.frame |
n_var |
Vector holding the number of variables used for the recursive feature elimination iterations; must not be continous (e.g. c(1:10, 20, 30)) |
mthd |
Core method used for the model (e.g. "rf" for random forest) |
mode |
Variable selection mode, either recursive feature elimination ("rfe") or forward feature selection ("ffs) |
seed_nbr |
Specific seed to be to ensure reproducability |
cv_nbr |
Specific cross validation folds to be used for model tuning within each forward or backward feature selection/elimination step |
var_selection |
Select final number of variables based on a standard deviation statistic ("sd", more conservative) or by the actual best number ("indv") |
metric |
The metric to be used to compute the model performance. |
tune_length |
Tune length to be used in recursive feature elimination (if NULL, the fixed default grid taken from the GPM LUT will be used). |
response_nbr |
Response ID to be computed; only relevant if more than one response variable is present and a model should not be built for each of them |
resample_nbr |
Resample ID to be computed; only relevant if the model training should not run over all resamples |
filepath_tmp |
If set, intermediate model results during the variable selection are writen to disc; if the procedure stops for some reason, the already computed results can be read in again which saves computation time (e.g. after an accidential shutdown etc.) |
... |
Additional arguments passed to |
response |
The column name(s) of the response variable(s) |
predictor |
The column ID of the predictor, i.e. independent variable(s) in the dataset |
selector |
Selector id |
meta |
Meta information of the gpm object |
resamples |
The list of the resamples containing the individual row
numbers (resulting from function |
The backfard feature selection is based on the implementation of the caret::rfe function. The forward feature selection is implemented from scratch. The latter stops if the error statistics get worse after a first optimum is reached. For model training, the respective caret functions are used, too.
NONE
A layer within the gpm object with the model training information for each response variable and all resamplings.
Trained model for each response variable and all resamplings.
The function uses functions from: Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang and Can Candan. (2016). caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret
NONE
1 2 3 4 | ## Not run:
#Not run
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.