Description Usage Arguments Details Value Author(s) See Also Examples
Performance evaluation and parameter tuning use resampling methods to estimate the performance of models. These are defined by resampling schemes, which are data frames where each column corresponds to a division of the data set into mutually exclusive training and test sets. Repeated hold out and cross-validation are two methods to create such schemes.
1 2 3 4 5 6 7 8 9 10 | resample(method, y, ..., subset = TRUE)
resample_holdout(y, test_fraction = 0.5, nfold = 5,
balanced = is.factor(y), subset)
resample_crossvalidation(y, nfold = 5, nrepeat = 5,
balanced = is.factor(y), subset)
resample_bootstrap(y, nfold = 10, fit_fraction = if (replace) 1 else 0.632,
replace = TRUE, balanced = is.factor(y), subset)
|
method |
The resampling method to use, e.g. |
y |
Observations to be divided. |
... |
Sent to the method specific function, e.g.
|
subset |
Which objects in |
test_fraction |
Fraction of objects to hold out (0 < test_fraction < 1). |
nfold |
Number of folds. |
balanced |
Whether the sets should be balanced or not, i.e. if the class ratio over the sets should be kept constant (as far as possible). |
nrepeat |
Number of fold sets to generate. |
fit_fraction |
The size of the training set relative to the entire data set. |
replace |
Whether to sample with replacement. |
Note that when setting up analyzes, the user should not call
resample_holdout
or resample_crossvalidation
directly, as
resample
performs additional necessary processing of the scheme.
Resampling scheme can be visualized in a human digestible form with the
image
function.
Functions for generating custom resampling schemes should be implemented as
follows and then called by resample("myMethod", ...)
:
resample_myMethod <- function(y, ..., subset)
y
Response vector.
...
Method specific attributes.
subset
Indexes of observations to be excluded for the resampling.
The function should return a list of the following elements:
folds
A data frame with the folds of the scheme that conforms to the description in the 'Value' section below.
parameter
A list with the parameters necessary to generate
such a resampling scheme. These are needed when creating subschemes
needed for parameter tuning, see subresample
.
A data frame defining a resampling scheme. TRUE
or a positive integer
codes for training set and FALSE
or 0
codes for test set.
Positive integers > 1 code for multiple copies of an observation in the
training set. NA
codes for neither training nor test set and is
used to exclude observations from the analysis altogether.
Christofer Bäcklin
emil
, subresample
,
image.resample
, index_fit
1 2 3 4 5 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.