makeResampleDesc: Create a description object for a resampling strategy.

Description Usage Arguments Details Value Standard ResampleDesc objects See Also Examples

View source: R/ResampleDesc.R

Description

A description of a resampling algorithm contains all necessary information to create a ResampleInstance, when given the size of the data set.

Usage

1
2
makeResampleDesc(method, predict = "test", ..., stratify = FALSE,
  stratify.cols = NULL)

Arguments

method

[character(1)]
“CV” for cross-validation, “LOO” for leave-one-out, “RepCV” for repeated cross-validation, “Bootstrap” for out-of-bag bootstrap, “Subsample” for subsampling, “Holdout” for holdout.

predict

[character(1)]
What to predict during resampling: “train”, “test” or “both” sets. Default is “test”.

...

[any]
Further parameters for strategies.

iters [integer(1)]

Number of iterations, for “CV”, “Subsample” and “Bootstrap”.

split [numeric(1)]

Proportion of training cases for “Holdout” and “Subsample” between 0 and 1. Default is 2 / 3.

reps [integer(1)]

Repeats for “RepCV”. Here iters = folds * reps. Default is 10.

folds [integer(1)]

Folds in the repeated CV for RepCV. Here iters = folds * reps. Default is 10.

stratify

[logical(1)]
Should stratification be done for the target variable? For classification tasks, this means that the resampling strategy is applied to all classes individually and the resulting index sets are joined to make sure that the proportion of observations in each training set is as in the original data set. Useful for imbalanced class sizes. For survival tasks stratification is done on the events, resulting in training sets with comparable censoring rates.

stratify.cols

[character]
Stratify on specific columns referenced by name. All columns have to be factors. Note that you have to ensure yourself that stratification is possible, i.e. that each strata contains enough observations. This argument and stratify are mutually exclusive.

Details

Some notes on some special strategies:

Repeated cross-validation

Use “RepCV”. Then you have to set the aggregation function for your preferred performance measure to “testgroup.mean” via setAggregation.

B632 bootstrap

Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the aggregation function for your preferred performance measure to “b632” via setAggregation.

B632+ bootstrap

Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the aggregation function for your preferred performance measure to “b632plus” via setAggregation.

Fixed Holdout set

Use makeFixedHoldoutInstance.

Object slots:

id [character(1)]

Name of resampling strategy.

iters [integer(1)]

Number of iterations. Note that this is always the complete number of generated train/test sets, so for a 10-times repeated 5fold cross-validation it would be 50.

predict [character(1)]

See argument.

stratify [logical(1)]

See argument.

All parameters passed in ... under the respective argument name

See arguments.

Value

[ResampleDesc].

Standard ResampleDesc objects

For common resampling strategies you can save some typing by using the following description objects:

hout

holdout a.k.a. test sample estimation (two-thirds training set, one-third testing set)

cv2

2-fold cross-validation

cv3

3-fold cross-validation

cv5

5-fold cross-validation

cv10

10-fold cross-validation

See Also

Other resample: ResamplePrediction, ResampleResult, addRRMeasure, getRRPredictionList, getRRPredictions, getRRTaskDescription, getRRTaskDesc, makeResampleInstance, resample

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Bootstraping
makeResampleDesc("Bootstrap", iters = 10)
makeResampleDesc("Bootstrap", iters = 10, predict = "both")

# Subsampling
makeResampleDesc("Subsample", iters = 10, split = 3/4)
makeResampleDesc("Subsample", iters = 10)

# Holdout a.k.a. test sample estimation
makeResampleDesc("Holdout")

berndbischl/mlr documentation built on Nov. 21, 2017, 12:51 a.m.