resample: Resampling schemes

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/resampling.r

Description

Performance evaluation and parameter tuning use resampling methods to estimate the performance of models. These are defined by resampling schemes, which are data frames where each column corresponds to a division of the data set into mutually exclusive training and test sets. Repeated hold out and cross-validation are two methods to create such schemes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
resample(method, y, ..., subset = TRUE)

resample_holdout(y, test_fraction = 0.5, nfold = 5,
  balanced = is.factor(y), subset)

resample_crossvalidation(y, nfold = 5, nrepeat = 5,
  balanced = is.factor(y), subset)

resample_bootstrap(y, nfold = 10, fit_fraction = if (replace) 1 else 0.632,
  replace = TRUE, balanced = is.factor(y), subset)

Arguments

method

The resampling method to use, e.g. "holdout" or "crossvalidation".

y

Observations to be divided.

...

Sent to the method specific function, e.g. "resample_holdout".

subset

Which objects in y that are to be divided and which that are not to be part of neither set. If subset is a resampling scheme, a list of inner cross-validation schemes will be returned.

test_fraction

Fraction of objects to hold out (0 < test_fraction < 1).

nfold

Number of folds.

balanced

Whether the sets should be balanced or not, i.e. if the class ratio over the sets should be kept constant (as far as possible).

nrepeat

Number of fold sets to generate.

fit_fraction

The size of the training set relative to the entire data set.

replace

Whether to sample with replacement.

Details

Note that when setting up analyzes, the user should not call resample_holdout or resample_crossvalidation directly, as resample performs additional necessary processing of the scheme.

Resampling scheme can be visualized in a human digestible form with the image function.

Functions for generating custom resampling schemes should be implemented as follows and then called by resample("myMethod", ...):

resample_myMethod <- function(y, ..., subset)

y

Response vector.

...

Method specific attributes.

subset

Indexes of observations to be excluded for the resampling.

The function should return a list of the following elements:

folds

A data frame with the folds of the scheme that conforms to the description in the 'Value' section below.

parameter

A list with the parameters necessary to generate such a resampling scheme. These are needed when creating subschemes needed for parameter tuning, see subresample.

Value

A data frame defining a resampling scheme. TRUE or a positive integer codes for training set and FALSE or 0 codes for test set. Positive integers > 1 code for multiple copies of an observation in the training set. NA codes for neither training nor test set and is used to exclude observations from the analysis altogether.

Author(s)

Christofer Bäcklin

See Also

emil, subresample, image.resample, index_fit

Examples

1
2
3
4
5
resample("holdout", 1:50, test_fraction=1/3)
resample("holdout", factor(runif(60) >= .5))
y <- factor(runif(60) >= .5)
cv <- resample("crossvalidation", y)
image(cv, main="Cross-validation scheme")

emil documentation built on Aug. 1, 2018, 1:03 a.m.