rfeControl | R Documentation |

This function generates a control object that can be used to specify the details of the feature selection algorithms used in this package.

rfeControl( functions = NULL, rerank = FALSE, method = "boot", saveDetails = FALSE, number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25), repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, timingSamps = 0, seeds = NA, allowParallel = TRUE )

`functions` |
a list of functions for model fitting, prediction and variable importance (see Details below) |

`rerank` |
a logical: should variable importance be re-calculated each time features are removed? |

`method` |
The external resampling method: |

`saveDetails` |
a logical to save the predictions and variable importances from the selection process |

`number` |
Either the number of folds or number of resampling iterations |

`repeats` |
For repeated k-fold cross-validation only: the number of complete sets of folds to compute |

`verbose` |
a logical to print a log for each external resampling iteration |

`returnResamp` |
A character string indicating how much of the resampled summary metrics should be saved. Values can be “final”, “all” or “none” |

`p` |
For leave-group out cross-validation: the training percentage |

`index` |
a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration. |

`indexOut` |
a list (the same length as |

`timingSamps` |
the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated). |

`seeds` |
an optional set of integers that will be used to set the seed
at each resampling iteration. This is useful when the models are run in
parallel. A value of |

`allowParallel` |
if a parallel backend is loaded and available, should the function use it? |

More details on this function can be found at http://topepo.github.io/caret/recursive-feature-elimination.html#rfe.

Backwards selection requires function to be specified for some operations.

The `fit`

function builds the model based on the current data set. The
arguments for the function must be:

`x`

the current training set of predictor data with the appropriate subset of variables`y`

the current outcome data (either a numeric or factor vector)`first`

a single logical value for whether the current predictor set has all possible variables`last`

similar to`first`

, but`TRUE`

when the last model is fit with the final subset size and predictors.`...`

optional arguments to pass to the fit function in the call to`rfe`

The function should return a model object that can be used to generate predictions.

The `pred`

function returns a vector of predictions (numeric or
factors) from the current model. The arguments are:

`object`

the model generated by the`fit`

function`x`

the current set of predictor set for the held-back samples

The `rank`

function is used to return the predictors in the order of
the most important to the least important. Inputs are:

`object`

the model generated by the`fit`

function`x`

the current set of predictor set for the training samples`y`

the current training outcomes

The function should return a
data frame with a column called `var`

that has the current variable
names. The first row should be the most important predictor etc. Other
columns can be included in the output and will be returned in the final
`rfe`

object.

The `selectSize`

function determines the optimal number of predictors
based on the resampling output. Inputs for the function are:

`x`

a matrix with columns for the performance metrics and the number of variables, called "`Variables`

"`metric`

a character string of the performance measure to optimize (e.g. "RMSE", "Rsquared", "Accuracy" or "Kappa")`maximize`

a single logical for whether the metric should be maximized

This function should return an integer
corresponding to the optimal subset size. caret comes with two
examples functions for this purpose: `pickSizeBest`

and
`pickSizeTolerance`

.

After the optimal subset size is determined, the `selectVar`

function
will be used to calculate the best rankings for each variable across all the
resampling iterations. Inputs for the function are:

`y`

a list of variables importance for each resampling iteration and each subset size (generated by the user-defined`rank`

function). In the example, each each of the cross-validation groups the output of the`rank`

function is saved for each of the subset sizes (including the original subset). If the rankings are not recomputed at each iteration, the values will be the same within each cross-validation iteration.`size`

the integer returned by the`selectSize`

function

This function
should return a character string of predictor names (of length `size`

)
in the order of most important to least important

Examples of these functions are included in the package:
`lmFuncs`

, `rfFuncs`

, `treebagFuncs`

and
`nbFuncs`

.

Model details about these functions, including examples, are at http://topepo.github.io/caret/recursive-feature-elimination.html. .

A list

Max Kuhn

`rfe`

, `lmFuncs`

, `rfFuncs`

,
`treebagFuncs`

, `nbFuncs`

,
`pickSizeBest`

, `pickSizeTolerance`

## Not run: subsetSizes <- c(2, 4, 6, 8) set.seed(123) seeds <- vector(mode = "list", length = 51) for(i in 1:50) seeds[[i]] <- sample.int(1000, length(subsetSizes) + 1) seeds[[51]] <- sample.int(1000, 1) set.seed(1) rfMod <- rfe(bbbDescr, logBBB, sizes = subsetSizes, rfeControl = rfeControl(functions = rfFuncs, seeds = seeds, number = 50)) ## End(Not run)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.