Performance evaluation and parameter tuning use resampling methods to estimate the performance of models. These are defined by resampling schemes, which are data frames where each column corresponds to a division of the data set into mutually exclusive training and test sets. Repeated hold out and crossvalidation are two methods to create such schemes.
1 2 3 4 5 6 7 8 9 10  resample(method, y, ..., subset = TRUE)
resample_holdout(y, test_fraction = 0.5, nfold = 5,
balanced = is.factor(y), subset)
resample_crossvalidation(y, nfold = 5, nrepeat = 5,
balanced = is.factor(y), subset)
resample_bootstrap(y, nfold = 10, fit_fraction = if (replace) 1 else 0.632,
replace = TRUE, balanced = is.factor(y), subset)

method 
The resampling method to use, e.g. 
y 
Observations to be divided. 
... 
Sent to the method specific function, e.g.

subset 
Which objects in 
test_fraction 
Fraction of objects to hold out (0 < test_fraction < 1). 
nfold 
Number of folds. 
balanced 
Whether the sets should be balanced or not, i.e. if the class ratio over the sets should be kept constant (as far as possible). 
nrepeat 
Number of fold sets to generate. 
fit_fraction 
The size of the training set relative to the entire data set. 
replace 
Whether to sample with replacement. 
Note that when setting up analyzes, the user should not call
resample_holdout
or resample_crossvalidation
directly, as
resample
performs additional necessary processing of the scheme.
Resampling scheme can be visualized in a human digestible form with the
image
function.
Functions for generating custom resampling schemes should be implemented as
follows and then called by resample("myMethod", ...)
:
resample_myMethod < function(y, ..., subset)
y
Response vector.
...
Method specific attributes.
subset
Indexes of observations to be excluded for the resampling.
The function should return a list of the following elements:
folds
A data frame with the folds of the scheme that conforms to the description in the 'Value' section below.
parameter
A list with the parameters necessary to generate
such a resampling scheme. These are needed when creating subschemes
needed for parameter tuning, see subresample
.
A data frame defining a resampling scheme. TRUE
or a positive integer
codes for training set and FALSE
or 0
codes for test set.
Positive integers > 1 code for multiple copies of an observation in the
training set. NA
codes for neither training nor test set and is
used to exclude observations from the analysis altogether.
Christofer Bäcklin
emil
, subresample
,
image.resample
, index_fit
