trainControl  R Documentation 
Control the computational nuances of the train
function
trainControl(
method = "boot",
number = ifelse(grepl("cv", method), 10, 25),
repeats = ifelse(grepl("[d_]cv$", method), 1, NA),
p = 0.75,
search = "grid",
initialWindow = NULL,
horizon = 1,
fixedWindow = TRUE,
skip = 0,
verboseIter = FALSE,
returnData = TRUE,
returnResamp = "final",
savePredictions = FALSE,
classProbs = FALSE,
summaryFunction = defaultSummary,
selectionFunction = "best",
preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5, freqCut = 95/5, uniqueCut =
10, cutoff = 0.9),
sampling = NULL,
index = NULL,
indexOut = NULL,
indexFinal = NULL,
timingSamps = 0,
predictionBounds = rep(FALSE, 2),
seeds = NA,
adaptive = list(min = 5, alpha = 0.05, method = "gls", complete = TRUE),
trim = FALSE,
allowParallel = TRUE
)
method 
The resampling method: 
number 
Either the number of folds or number of resampling iterations 
repeats 
For repeated kfold crossvalidation only: the number of complete sets of folds to compute 
p 
For leavegroup out crossvalidation: the training percentage 
search 
Either 
initialWindow, horizon, fixedWindow, skip 
possible arguments to

verboseIter 
A logical for printing a training log. 
returnData 
A logical for saving the data 
returnResamp 
A character string indicating how much of the resampled
summary metrics should be saved. Values can be 
savePredictions 
an indicator of how much of the holdout predictions
for each resample should be saved. Values can be either 
classProbs 
a logical; should class probabilities be computed for classification models (along with predicted values) in each resample? 
summaryFunction 
a function to compute performance metrics across
resamples. The arguments to the function should be the same as those in

selectionFunction 
the function used to select the optimal tuning
parameter. This can be a name of the function or the function itself. See

preProcOptions 
A list of options to pass to 
sampling 
a single character value describing the type of additional
sampling that is conducted after resampling (usually to resolve class
imbalances). Values are 
index 
a list with elements for each resampling iteration. Each list element is a vector of integers corresponding to the rows used for training at that iteration. 
indexOut 
a list (the same length as 
indexFinal 
an optional vector of integers indicating which samples
are used to fit the final model after resampling. If 
timingSamps 
the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated. 
predictionBounds 
a logical or numeric vector of length 2 (regression
only). If logical, the predictions can be constrained to be within the limit
of the training set outcomes. For example, a value of 
seeds 
an optional set of integers that will be used to set the seed
at each resampling iteration. This is useful when the models are run in
parallel. A value of 
adaptive 
a list used when 
trim 
a logical. If 
allowParallel 
if a parallel backend is loaded and available, should the function use it? 
When setting the seeds manually, the number of models being evaluated is
required. This may not be obvious as train
does some optimizations
for certain models. For example, when tuning over PLS model, the only model
that is fit is the one with the largest number of components. So if the
model is being tuned over comp in 1:10
, the only model fit is
ncomp = 10
. However, if the vector of integers used in the
seeds
arguments is longer than actually needed, no error is thrown.
Using method = "none"
and specifying more than one model in
train
's tuneGrid
or tuneLength
arguments will
result in an error.
Using adaptive resampling when method
is either "adaptive_cv"
,
"adaptive_boot"
or "adaptive_LGOCV"
, the full set of resamples
is not run for each model. As resampling continues, a futility analysis is
conducted and models with a low probability of being optimal are removed.
These features are experimental. See Kuhn (2014) for more details. The
options for this procedure are:
min
: the minimum number of resamples used before
models are removed
alpha
: the confidence level of the onesided
intervals used to measure futility
method
: either generalized
least squares (method = "gls"
) or a BradleyTerry model (method
= "BT"
)
complete
: if a single parameter value is found before
the end of resampling, should the full set of resamples be computed for that
parameter. )
The option search = "grid"
uses the default grid search routine. When
search = "random"
, a random search procedure is used (Bergstra and
Bengio, 2012). See http://topepo.github.io/caret/randomhyperparametersearch.html for
details and an example.
The supported bootstrap methods are:
"boot"
: the usual bootstrap.
"boot632"
: the 0.632 bootstrap estimator (Efron, 1983).
"optimism_boot"
: the optimism bootstrap estimator.
(Efron and Tibshirani, 1994).
"boot_all"
: all of the above (for efficiency,
but "boot" will be used for calculations).
The "boot632"
method should not to be confused with the 0.632+
estimator proposed later by the same author.
Note that if index
or indexOut
are specified, the label shown by train
may not be accurate since these arguments supersede the method
argument.
An echo of the parameters specified
Max Kuhn
Efron (1983). “Estimating the error rate of a prediction rule: improvement on crossvalidation”. Journal of the American Statistical Association, 78(382):316331
Efron, B., & Tibshirani, R. J. (1994). “An introduction to the bootstrap”, pages 249252. CRC press.
Bergstra and Bengio (2012), “Random Search for HyperParameter Optimization”, Journal of Machine Learning Research, 13(Feb):281305
Kuhn (2014), “Futility Analysis in the CrossValidation of Machine Learning Models” https://arxiv.org/abs/1405.6974,
Package website for subsampling: https://topepo.github.io/caret/subsamplingforclassimbalances.html
## Not run:
## Do 5 repeats of 10Fold CV for the iris data. We will fit
## a KNN model that evaluates 12 values of k and set the seed
## at each iteration.
set.seed(123)
seeds < vector(mode = "list", length = 51)
for(i in 1:50) seeds[[i]] < sample.int(1000, 22)
## For the last model:
seeds[[51]] < sample.int(1000, 1)
ctrl < trainControl(method = "repeatedcv",
repeats = 5,
seeds = seeds)
set.seed(1)
mod < train(Species ~ ., data = iris,
method = "knn",
tuneLength = 12,
trControl = ctrl)
ctrl2 < trainControl(method = "adaptive_cv",
repeats = 5,
verboseIter = TRUE,
seeds = seeds)
set.seed(1)
mod2 < train(Species ~ ., data = iris,
method = "knn",
tuneLength = 12,
trControl = ctrl2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.