rfeControl: Controlling the Feature Selection Algorithms

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/rfe.R

Description

This function generates a control object that can be used to specify the details of the feature selection algorithms used in this package.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
rfeControl(
  functions = NULL,
  rerank = FALSE,
  method = "boot",
  saveDetails = FALSE,
  number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25),
  repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number),
  verbose = FALSE,
  returnResamp = "final",
  p = 0.75,
  index = NULL,
  indexOut = NULL,
  timingSamps = 0,
  seeds = NA,
  allowParallel = TRUE
)

Arguments

functions

a list of functions for model fitting, prediction and variable importance (see Details below)

rerank

a logical: should variable importance be re-calculated each time features are removed?

method

The external resampling method: boot, cv, LOOCV or LGOCV (for repeated training/test splits

saveDetails

a logical to save the predictions and variable importances from the selection process

number

Either the number of folds or number of resampling iterations

repeats

For repeated k-fold cross-validation only: the number of complete sets of folds to compute

verbose

a logical to print a log for each external resampling iteration

returnResamp

A character string indicating how much of the resampled summary metrics should be saved. Values can be “final”, “all” or “none”

p

For leave-group out cross-validation: the training percentage

index

a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration.

indexOut

a list (the same length as index) that dictates which sample are held-out for each resample. If NULL, then the unique set of samples not contained in index is used.

timingSamps

the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated).

seeds

an optional set of integers that will be used to set the seed at each resampling iteration. This is useful when the models are run in parallel. A value of NA will stop the seed from being set within the worker processes while a value of NULL will set the seeds using a random set of integers. Alternatively, a list can be used. The list should have B+1 elements where B is the number of resamples. The first B elements of the list should be vectors of integers of length P where P is the number of subsets being evaluated (including the full set). The last element of the list only needs to be a single integer (for the final model). See the Examples section below.

allowParallel

if a parallel backend is loaded and available, should the function use it?

Details

More details on this function can be found at http://topepo.github.io/caret/recursive-feature-elimination.html#rfe.

Backwards selection requires function to be specified for some operations.

The fit function builds the model based on the current data set. The arguments for the function must be:

The function should return a model object that can be used to generate predictions.

The pred function returns a vector of predictions (numeric or factors) from the current model. The arguments are:

The rank function is used to return the predictors in the order of the most important to the least important. Inputs are:

The function should return a data frame with a column called var that has the current variable names. The first row should be the most important predictor etc. Other columns can be included in the output and will be returned in the final rfe object.

The selectSize function determines the optimal number of predictors based on the resampling output. Inputs for the function are:

This function should return an integer corresponding to the optimal subset size. caret comes with two examples functions for this purpose: pickSizeBest and pickSizeTolerance.

After the optimal subset size is determined, the selectVar function will be used to calculate the best rankings for each variable across all the resampling iterations. Inputs for the function are:

This function should return a character string of predictor names (of length size) in the order of most important to least important

Examples of these functions are included in the package: lmFuncs, rfFuncs, treebagFuncs and nbFuncs.

Model details about these functions, including examples, are at http://topepo.github.io/caret/recursive-feature-elimination.html. .

Value

A list

Author(s)

Max Kuhn

See Also

rfe, lmFuncs, rfFuncs, treebagFuncs, nbFuncs, pickSizeBest, pickSizeTolerance

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  ## Not run: 
subsetSizes <- c(2, 4, 6, 8)
set.seed(123)
seeds <- vector(mode = "list", length = 51)
for(i in 1:50) seeds[[i]] <- sample.int(1000, length(subsetSizes) + 1)
seeds[[51]] <- sample.int(1000, 1)

set.seed(1)
rfMod <- rfe(bbbDescr, logBBB,
             sizes = subsetSizes,
             rfeControl = rfeControl(functions = rfFuncs,
                                     seeds = seeds,
                                     number = 50))
  
## End(Not run)

caret documentation built on Oct. 9, 2021, 5:09 p.m.