View source: R/selectByFilter.R
sbfControl | R Documentation |
Controls the execution of models with simple filters for feature selection
sbfControl(
functions = NULL,
method = "boot",
saveDetails = FALSE,
number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25),
repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number),
verbose = FALSE,
returnResamp = "final",
p = 0.75,
index = NULL,
indexOut = NULL,
timingSamps = 0,
seeds = NA,
allowParallel = TRUE,
multivariate = FALSE
)
functions |
a list of functions for model fitting, prediction and variable filtering (see Details below) |
method |
The external resampling method: |
saveDetails |
a logical to save the predictions and variable importances from the selection process |
number |
Either the number of folds or number of resampling iterations |
repeats |
For repeated k-fold cross-validation only: the number of complete sets of folds to compute |
verbose |
a logical to print a log for each external resampling iteration |
returnResamp |
A character string indicating how much of the resampled summary metrics should be saved. Values can be “final” or “none” |
p |
For leave-group out cross-validation: the training percentage |
index |
a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration. |
indexOut |
a list (the same length as |
timingSamps |
the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated). |
seeds |
an optional set of integers that will be used to set the seed
at each resampling iteration. This is useful when the models are run in
parallel. A value of |
allowParallel |
if a parallel backend is loaded and available, should the function use it? |
multivariate |
a logical; should all the columns of |
More details on this function can be found at http://topepo.github.io/caret/feature-selection-using-univariate-filters.html.
Simple filter-based feature selection requires function to be specified for some operations.
The fit
function builds the model based on the current data set. The
arguments for the function must be:
x
the current
training set of predictor data with the appropriate subset of variables
(i.e. after filtering)
y
the current outcome data (either a
numeric or factor vector)
...
optional arguments to pass to the
fit function in the call to sbf
The function should return a model object that can be used to generate predictions.
The pred
function returns a vector of predictions (numeric or
factors) from the current model. The arguments are:
object
the model generated by the fit
function
x
the current set of predictor set for the held-back samples
The score
function is used to return scores with names for each
predictor (such as a p-value). Inputs are:
x
the
predictors for the training samples. If sbfControl()$multivariate
is
TRUE
, this will be the full predictor matrix. Otherwise it is a
vector for a specific predictor.
y
the current training
outcomes
When sbfControl()$multivariate
is TRUE
, the
score
function should return a named vector where
length(scores) == ncol(x)
. Otherwise, the function's output should be
a single value. Univariate examples are give by anovaScores
for classification and gamScores
for regression and the
example below.
The filter
function is used to return a logical vector with names for
each predictor (TRUE
indicates that the prediction should be
retained). Inputs are:
score
the output of the
score
function
x
the predictors for the training samples
y
the current training outcomes
The function should return a named logical vector.
Examples of these functions are included in the package:
caretSBF
, lmSBF
, rfSBF
,
treebagSBF
, ldaSBF
and nbSBF
.
The web page http://topepo.github.io/caret/ has more details and examples related to this function.
a list that echos the specified arguments
Max Kuhn
sbf
, caretSBF
, lmSBF
,
rfSBF
, treebagSBF
, ldaSBF
and
nbSBF
## Not run:
data(BloodBrain)
## Use a GAM is the filter, then fit a random forest model
set.seed(1)
RFwithGAM <- sbf(bbbDescr, logBBB,
sbfControl = sbfControl(functions = rfSBF,
verbose = FALSE,
seeds = sample.int(100000, 11),
method = "cv"))
RFwithGAM
## A simple example for multivariate scoring
rfSBF2 <- rfSBF
rfSBF2$score <- function(x, y) apply(x, 2, rfSBF$score, y = y)
set.seed(1)
RFwithGAM2 <- sbf(bbbDescr, logBBB,
sbfControl = sbfControl(functions = rfSBF2,
verbose = FALSE,
seeds = sample.int(100000, 11),
method = "cv",
multivariate = TRUE))
RFwithGAM2
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.