library("mlr")
library("BBmisc")
library("ParamHelpers")
# Not strictly necessary, but otherwise we might get NAs later on
# if 'rpart' is not installed.
library("rpart")

# show grouped code output instead of single lines
knitr::opts_chunk$set(collapse = TRUE)
set.seed(123)

Resampling strategies are usually used to assess the performance of a learning algorithm: The entire data set is (repeatedly) split into training sets $D^{b}$ and test sets $D \setminus D^{b}$, $b = 1,\ldots,B$. The learner is trained on each training set, predictions are made on the corresponding test set (sometimes on the training set as well) and the performance measure $S(D^{b}, D \setminus D^{b})$ is calculated. Then the $B$ individual performance values are aggregated, most often by calculating the mean. There exist various different resampling strategies, for example cross-validation and bootstrap, to mention just two popular approaches.

knitr::include_graphics("../img/resampling.png")
knitr::include_graphics("../img/resampling.png")

If you want to read up on further details, the paper Resampling Strategies for Model Assessment and Selection by Simon is probably not a bad choice. Bernd has also published a paper Resampling methods for meta-model validation with recommendations for evolutionary computation which contains detailed descriptions and lots of statistical background information on resampling methods.

Defining the resampling strategy

In mlr the resampling strategy can be defined via function makeResampleDesc(). It requires a string that specifies the resampling method and, depending on the selected strategy, further information like the number of iterations. The supported resampling strategies are:

For example if you want to use 3-fold cross-validation type:

# 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3)
rdesc

For holdout estimation use:

# Holdout estimation
rdesc = makeResampleDesc("Holdout")
rdesc

In order to save you some typing mlr contains some pre-defined resample descriptions for very common strategies like holdout (hout (makeResampleDesc())) as well as cross-validation with different numbers of folds (e.g., cv5 (makeResampleDesc()) or cv10 (makeResampleDesc())).

hout

cv3

Performing the resampling

Function resample() evaluates a Learner (makeLearner()) on a given machine learning Task() using the selected resampling strategy (makeResampleDesc()).

As a first example, the performance of linear regression (stats::lm()) on the BostonHousing (mlbench::BostonHousing()) data set is calculated using 3-fold cross-validation.

Generally, for $K$-fold cross-validation the data set $D$ is partitioned into $K$ subsets of (approximately) equal size. In the $b$-th of the $K$ iterations, the $b$-th subset is used for testing, while the union of the remaining parts forms the training set.

As usual, you can either pass a Learner (makeLearner()) object to resample() or, as done here, provide the class name "regr.lm" of the learner. Since no performance measure is specified the default for regression learners (mean squared error, mse{target="_blank"}) is calculated.

# Specify the resampling strategy (3-fold cross-validation)
rdesc = makeResampleDesc("CV", iters = 3)

# Calculate the performance
r = resample("regr.lm", bh.task, rdesc)

r
# Specify the resampling strategy (3-fold cross-validation)
rdesc = makeResampleDesc("CV", iters = 3)

# Calculate the performance
r = resample("regr.lm", bh.task, rdesc)
## Resampling: cross-validation
## Measures:             mse
## [Resample] iter 1:    25.1371739
## [Resample] iter 2:    23.1279497
## [Resample] iter 3:    21.9152672
##
## Aggregated Result: mse.test.mean=23.3934636
##

r
## Resample Result
## Task: BostonHousing-example
## Learner: regr.lm
## Aggr perf: mse.test.mean=23.3934636
## Runtime: 0.0375051

The result r is an object of class resample() result. It contains performance results for the learner and some additional information like the runtime, predicted values, and optionally the models fitted in single resampling iterations.

# Peak into r
names(r)

r$aggr

r$measures.test

r$measures.test gives the performance on each of the 3 test data sets. r$aggr shows the aggregated performance value. Its name "mse.test.mean" indicates the performance measure, mse{target="_blank"}, and the method, test.mean (aggregations()), used to aggregate the 3 individual performances. test.mean (aggregations()) is the default aggregation scheme for most performance measures and, as the name implies, takes the mean over the performances on the test data sets.

Resampling in mlr works the same way for all types of learning problems and learners. Below is a classification example where a classification tree (rpart) (rpart::rpart()) is evaluated on the Sonar (mlbench::sonar()) data set by subsampling with 5 iterations.

In each subsampling iteration the data set $D$ is randomly partitioned into a training and a test set according to a given percentage, e.g., 2/3 training and 1/3 test set. If there is just one iteration, the strategy is commonly called holdout or test sample estimation.

You can calculate several measures at once by passing a list of Measures (makeMeasure())s to resample(). Below, the error rate (mmce{target="_blank"}), false positive and false negative rates (fpr{target="_blank"}, fnr{target="_blank"}), and the time it takes to train the learner (timetrain{target="_blank"}) are estimated by subsampling with 5 iterations.

# Subsampling with 5 iterations and default split ratio 2/3
rdesc = makeResampleDesc("Subsample", iters = 5)

# Subsampling with 5 iterations and 4/5 training data
rdesc = makeResampleDesc("Subsample", iters = 5, split = 4 / 5)

# Classification tree with information splitting criterion
lrn = makeLearner("classif.rpart", parms = list(split = "information"))

# Calculate the performance measures
r = resample(lrn, sonar.task, rdesc, measures = list(mmce, fpr, fnr, timetrain))
# Subsampling with 5 iterations and default split ratio 2/3
rdesc = makeResampleDesc("Subsample", iters = 5)

# Subsampling with 5 iterations and 4/5 training data
rdesc = makeResampleDesc("Subsample", iters = 5, split = 4 / 5)

# Classification tree with information splitting criterion
lrn = makeLearner("classif.rpart", parms = list(split = "information"))

# Calculate the performance measures
r = resample(lrn, sonar.task, rdesc, measures = list(mmce, fpr, fnr, timetrain))
## Resampling: subsampling
## Measures:             mmce        fpr         fnr         timetrain
## [Resample] iter 1:    0.4047619   0.5416667   0.2222222   0.0110000
## [Resample] iter 2:    0.1666667   0.1200000   0.2352941   0.0070000
## [Resample] iter 3:    0.3333333   0.1333333   0.4444444   0.0100000
## [Resample] iter 4:    0.2380952   0.3913043   0.0526316   0.0280000
## [Resample] iter 5:    0.3095238   0.2800000   0.3529412   0.0080000
##
## Aggregated Result: mmce.test.mean=0.2904762,fpr.test.mean=0.2932609,fnr.test.mean=0.2615067,timetrain.test.mean=0.0128000
##

r
## Resample Result
## Task: Sonar-example
## Learner: classif.rpart
## Aggr perf: mmce.test.mean=0.2904762,fpr.test.mean=0.2932609,fnr.test.mean=0.2615067,timetrain.test.mean=0.0128000
## Runtime: 0.10692

If you want to add further measures afterwards, use addRRMeasure().

# Add balanced error rate (ber) and time used to predict
addRRMeasure(r, list(ber, timepredict))
# Add balanced error rate (ber) and time used to predict
addRRMeasure(r, list(ber, timepredict))

## Resample Result
## Task: Sonar-example
## Learner: classif.rpart
## Aggr perf: mmce.test.mean=0.2904762,fpr.test.mean=0.2932609,fnr.test.mean=0.2615067,timetrain.test.mean=0.0128000,ber.test.mean=0.2773838,timepredict.test.mean=0.0032000
## Runtime: 0.10692

By default, resample() prints progress messages and intermediate results. You can turn this off by setting show.info = FALSE, as done in the code chunk below. (If you are interested in suppressing these messages permanently have a look at the tutorial page about configuring mlr{target="_blank"}.)

In the above example, the Learner (makeLearner()) was explicitly constructed. For convenience you can also specify the learner as a string and pass any learner parameters via the ... argument of resample().

r = resample("classif.rpart", parms = list(split = "information"), sonar.task, rdesc,
  measures = list(mmce, fpr, fnr, timetrain), show.info = FALSE)

r
r = resample("classif.rpart", parms = list(split = "information"), sonar.task, rdesc,
  measures = list(mmce, fpr, fnr, timetrain), show.info = FALSE)

r

## Resample Result
## Task: Sonar-example
## Learner: classif.rpart
## Aggr perf: mmce.test.mean=0.2428571,fpr.test.mean=0.2968173,fnr.test.mean=0.1970195,timetrain.test.mean=0.0084000
## Runtime: 0.0791025

Accessing resample results

Apart from the learner performance you can extract further information from the resample results, for example predicted values or the models fitted in individual resample iterations.

Predictions

Per default, the resample() result contains the predictions made during the resampling. If you do not want to keep them, e.g., in order to conserve memory, set keep.pred = FALSE when calling resample().

The predictions are stored in slot $pred of the resampling result, which can also be accessed by function getRRPredictions().

r$pred

pred = getRRPredictions(r)
pred
r$pred
## Resampled Prediction for:
## Resample description: subsampling with 5 iterations and 0.80 split rate.
## Predict: test
## Stratification: FALSE
## predict.type: response
## threshold:
## time (mean): 0.00
##    id truth response iter  set
## 1  18     R        M    1 test
## 2 189     M        M    1 test
## 3  88     R        R    1 test
## 4 121     M        R    1 test
## 5 165     M        R    1 test
## 6 111     M        M    1 test
## ... (#rows: 210, #cols: 5)

pred = getRRPredictions(r)
pred
## Resampled Prediction for:
## Resample description: subsampling with 5 iterations and 0.80 split rate.
## Predict: test
## Stratification: FALSE
## predict.type: response
## threshold:
## time (mean): 0.00
##    id truth response iter  set
## 1  18     R        M    1 test
## 2 189     M        M    1 test
## 3  88     R        R    1 test
## 4 121     M        R    1 test
## 5 165     M        R    1 test
## 6 111     M        M    1 test
## ... (#rows: 210, #cols: 5)

pred is an object of class resample() Prediction. Just as a Prediction() object (see the tutorial page on making predictions{target="_blank"} it has an element $data which is a data.frame that contains the predictions and in the case of a supervised learning problem the true values of the target variable(s). You can use as.data.frame (Prediction() to directly access the $data slot. Moreover, all getter functions for Prediction() objects like getPredictionResponse() or getPredictionProbabilities() are applicable.

head(as.data.frame(pred))

head(getPredictionTruth(pred))

head(getPredictionResponse(pred))

The columns iter and set in the data.frame indicate the resampling iteration and the data set (train or test) for which the prediction was made.

By default, predictions are made for the test sets only. If predictions for the training set are required, set predict = "train" (for predictions on the train set only) or predict = "both" (for predictions on both train and test sets) in makeResampleDesc(). In any case, this is necessary for some bootstrap methods (b632 and b632+) and some examples are shown later on.

Below, we use simple Holdout, i.e., split the data once into a training and test set, as resampling strategy and make predictions on both sets.

# Make predictions on both training and test sets
rdesc = makeResampleDesc("Holdout", predict = "both")

r = resample("classif.lda", iris.task, rdesc, show.info = FALSE)
r

r$measures.train
# Make predictions on both training and test sets
rdesc = makeResampleDesc("Holdout", predict = "both")

r = resample("classif.lda", iris.task, rdesc, show.info = FALSE)
r
## Resample Result
## Task: iris-example
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200000
## Runtime: 0.00993848

r$measures.train
##   iter mmce
## 1    1 0.02

(Please note that nonetheless the misclassification rate r$aggr is estimated on the test data only. How to calculate performance measures on the training sets is shown below.)

A second function to extract predictions from resample results is getRRPredictionList() which returns a list of predictions split by data set (train/test) and resampling iteration.

predList = getRRPredictionList(r)
predList
predList = getRRPredictionList(r)
predList
## $train
## $train$`1`
## Prediction: 100 observations
## predict.type: response
## threshold:
## time: 0.00
##      id      truth   response
## 96   96 versicolor versicolor
## 130 130  virginica  virginica
## 120 120  virginica  virginica
## 77   77 versicolor versicolor
## 23   23     setosa     setosa
## 59   59 versicolor versicolor
## ... (#rows: 100, #cols: 3)
##
##
## $test
## $test$`1`
## Prediction: 50 observations
## predict.type: response
## threshold:
## time: 0.00
##      id      truth   response
## 92   92 versicolor versicolor
## 58   58 versicolor versicolor
## 48   48     setosa     setosa
## 103 103  virginica  virginica
## 70   70 versicolor versicolor
## 82   82 versicolor versicolor
## ... (#rows: 50, #cols: 3)

Learner models

In each resampling iteration a Learner (makeLearner()) is fitted on the respective training set. By default, the resulting WrappedModel (makeWrappedModel())s are not included in the resample() result and slot $models is empty. In order to keep them, set models = TRUE when calling resample(), as in the following survival analysis example.

# 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3)

r = resample("surv.coxph", lung.task, rdesc, show.info = FALSE, models = TRUE)
r$models

The extract option

Keeping complete fitted models can be memory-intensive if these objects are large or the number of resampling iterations is high. Alternatively, you can use the extract argument of resample() to retain only the information you need. To this end you need to pass a function to extract which is applied to each WrappedModel (makeWrappedModel()) object fitted in each resampling iteration.

Below, we cluster the datasets::mtcars() data using the $k$-means algorithm with $k = 3$ and keep only the cluster centers.

# 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3)

# Extract the compute cluster centers
r = resample("cluster.kmeans", mtcars.task, rdesc, show.info = FALSE,
  centers = 3, extract = function(x) getLearnerModel(x)$centers)
r$extract

As a second example, we extract the variable importances from fitted regression trees using function getFeatureImportance(). (For more detailed information on this topic see the feature selection{target="_blank"} page.)

# Extract the variable importance in a regression tree
r = resample("regr.rpart", bh.task, rdesc, show.info = FALSE, extract = getFeatureImportance)
r$extract

There is also an convenience function getResamplingIndices() to extract the resampling indices from the ResampleResult object:

getResamplingIndices(r)

Stratification, Blocking and Grouping

Stratification with respect to the target variable(s)

For classification, it is usually desirable to have the same proportion of the classes in all of the partitions of the original data set. This is particularly useful in the case of imbalanced classes and small data sets. Otherwise, it may happen that observations of less frequent classes are missing in some of the training sets which can decrease the performance of the learner, or lead to model crashes. In order to conduct stratified resampling, set stratify = TRUE in makeResampleDesc().

# 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3, stratify = TRUE)

r = resample("classif.lda", iris.task, rdesc, show.info = FALSE)
r
# 3-fold cross-validation
rdesc = makeResampleDesc("CV", iters = 3, stratify = TRUE)

r = resample("classif.lda", iris.task, rdesc, show.info = FALSE)
r
## Resample Result
## Task: iris-example
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200000
## Runtime: 0.0176296

Stratification is also available for survival tasks. Here the stratification balances the censoring rate.

Stratification with respect to explanatory variables

Sometimes it is required to also stratify on the input data, e.g., to ensure that all subgroups are represented in all training and test sets. To stratify on the input columns, specify factor columns of your task data via stratify.cols.

rdesc = makeResampleDesc("CV", iters = 3, stratify.cols = "chas")

r = resample("regr.rpart", bh.task, rdesc, show.info = FALSE)
r
rdesc = makeResampleDesc("CV", iters = 3, stratify.cols = "chas")

r = resample("regr.rpart", bh.task, rdesc, show.info = FALSE)
r
## Resample Result
## Task: BostonHousing-example
## Learner: regr.rpart
## Aggr perf: mse.test.mean=23.8843587
## Runtime: 0.0268815

Blocking: CV with flexible predefined indices

If some observations "belong together" and must not be separated when splitting the data into training and test sets for resampling, you can supply this information via a blocking factor when creating the task{target="_blank"}.

# 5 blocks containing 30 observations each
task = makeClassifTask(data = iris, target = "Species", blocking = factor(rep(1:5, each = 30)))
task

When performing a simple "CV" resampling and inspecting the result, we see that the training indices in fold 1 correspond to the specified grouping set in blocking in the task. To initiate this method, we need to set blocking.cv = TRUE when creating the resample description object.

rdesc = makeResampleDesc("CV", iters = 3, blocking.cv = TRUE)
p = resample("classif.lda", task, rdesc)

sort(p$pred$instance$train.inds[[1]])

However, please note the effects of this method: The created folds will not have the same size! Here, Fold 1 has a 120/30 split while the other two folds have a 90/60 split.

lapply(p$pred$instance$train.inds, function(x) length(x))

This is caused by the fact that we supplied five groups that must belong together but only used a three fold resampling strategy here.

Grouping: CV with fixed predefined indices

There is a second way of using predefined indices in resampling in mlr: Constructing the folds based on the supplied indices in blocking. We refer to this method here as "grouping" to distinguish it from "blocking". This method is more restrictive in the way that it will always use the number of levels supplied via blocking as the number of folds. To use this method, we need to set fixed = TRUE instead of blocking.cv when creating the resampling description object.

We can leave out the iters argument, as it will be set internally to the number of supplied factor levels.

rdesc = makeResampleDesc("CV", fixed = TRUE)
p = resample("classif.lda", task, rdesc)
sort(p$pred$instance$train.inds[[1]])

You can see that we automatically created five folds in which the test set always corresponds to one factor level.

Doing it this way also means that we cannot do repeated CV because there is no way to create multiple shuffled folds of this fixed arrangement.

lapply(p$pred$instance$train.inds, function(x) length(x))

However, this method can also be used in nested resampling settings (e.g. in hyperparameter tuning). In the inner level, the factor levels are honored and the function simply creates one fold less than in the outer level.

Please note that the iters argument has no effect in makeResampleDesc() if fixed = TRUE. The number of folds will be automatically set based on the supplied number of factor levels via blocking. In the inner level, the number of folds will simply be one less than in the outer level.

# test fixed in nested resampling
lrn = makeLearner("classif.lda")
ctrl = makeTuneControlRandom(maxit = 2)
ps = makeParamSet(makeNumericParam("nu", lower = 2, upper = 20))
inner = makeResampleDesc("CV", fixed = TRUE)
outer = makeResampleDesc("CV", fixed = TRUE)
tune_wrapper = makeTuneWrapper(lrn, resampling = inner, par.set = ps,
  control = ctrl, show.info = FALSE)

p = resample(tune_wrapper, task, outer, show.info = FALSE,
  extract = getTuneResult)

To check on the inner resampling indices, you can call getResamplingIndices(inner = TRUE). You can see that for every outer fold (List of 5), four inner folds were created that respect the grouping supplied via the blocking argument.

Of course you can also use a normal random sampling "CV" description in the inner level by just setting fixed = FALSE.

str(getResamplingIndices(p, inner = TRUE))

Resample descriptions and resample instances

As already mentioned, you can specify a resampling strategy using function makeResampleDesc().

rdesc = makeResampleDesc("CV", iters = 3)
rdesc

str(rdesc)

str(makeResampleDesc("Subsample", stratify.cols = "chas"))

The result rdesc inherits from class ResampleDesc (makeResampleDesc()) (short for resample description) and, in principle, contains all necessary information about the resampling strategy including the number of iterations, the proportion of training and test sets, stratification variables, etc.

Given either the size of the data set at hand or the Task(), function makeResampleInstance() draws the training and test sets according to the ResampleDesc (makeResampleDesc()).

# Create a resample instance based an a task
rin = makeResampleInstance(rdesc, iris.task)
rin

str(rin)

# Create a resample instance given the size of the data set
rin = makeResampleInstance(rdesc, size = nrow(iris))
str(rin)

# Access the indices of the training observations in iteration 3
rin$train.inds[[3]]

The result rin inherits from class ResampleInstance (makeResampleInstance()) and contains lists of index vectors for the train and test sets.

If a ResampleDesc (makeResampleDesc()) is passed to resample(), it is instantiated internally. Naturally, it is also possible to pass a ResampleInstance (makeResampleInstance()) directly.

While the separation between resample descriptions, resample instances, and the resample() function itself seems overly complicated, it has several advantages:

rdesc = makeResampleDesc("CV", iters = 3)
rin = makeResampleInstance(rdesc, task = iris.task)

# Calculate the performance of two learners based on the same resample instance
r.lda = resample("classif.lda", iris.task, rin, show.info = FALSE)
r.rpart = resample("classif.rpart", iris.task, rin, show.info = FALSE)
r.lda$aggr

r.rpart$aggr

Usually, when calling makeResampleInstance() the train and test index sets are drawn randomly. Mainly for holdout (test sample) estimation you might want full control about the training and tests set and specify them manually. This can be done using function makeFixedHoldoutInstance().

rin = makeFixedHoldoutInstance(train.inds = 1:100, test.inds = 101:150, size = 150)
rin

Aggregating performance values

In each resampling iteration $b = 1,\ldots,B$ we get performance values $S(D^{b}, D \setminus D^{b})$ (for each measure we wish to calculate), which are then aggregated to an overall performance.

For the great majority of common resampling strategies (like holdout, cross-validation, subsampling) performance values are calculated on the test data sets only and for most measures aggregated by taking the mean (test.mean(aggregations())).

Each performance Measure (makeMeasure()) in mlr has a corresponding default aggregation method which is stored in slot $aggr. The default aggregation for most measures is test.mean(aggregations()). One exception is the root mean square error (rmse{target="_blank"}).

# Mean misclassification error
mmce$aggr
## Aggregation function: test.mean

mmce$aggr$fun
## function (task, perf.test, perf.train, measure, group, pred)
## mean(perf.test)
## <bytecode: 0xc8eade8>
## <environment: namespace:mlr>

# Root mean square error
rmse$aggr
## Aggregation function: test.rmse

rmse$aggr$fun
## function (task, perf.test, perf.train, measure, group, pred)
## sqrt(mean(perf.test^2))
## <bytecode: 0x23ad72d8>
## <environment: namespace:mlr>

You can change the aggregation method of a Measure (makeMeasure()) via function setAggregation(). All available aggregation schemes are listed on the aggregations() documentation page.

Example: One measure with different aggregations

The aggregation schemes test.median (aggregations()), test.min (aggregations()), and text.max (aggregations()) compute the median, minimum, and maximum of the performance values on the test sets.

mseTestMedian = setAggregation(mse, test.median)
mseTestMin = setAggregation(mse, test.min)
mseTestMax = setAggregation(mse, test.max)

mseTestMedian

rdesc = makeResampleDesc("CV", iters = 3)
r = resample("regr.lm", bh.task, rdesc, measures = list(mse, mseTestMedian, mseTestMin, mseTestMax))
mseTestMedian = setAggregation(mse, test.median)
mseTestMin = setAggregation(mse, test.min)
mseTestMax = setAggregation(mse, test.max)

mseTestMedian
## Name: Mean of squared errors
## Performance measure: mse
## Properties: regr,req.pred,req.truth
## Minimize: TRUE
## Best: 0; Worst: Inf
## Aggregated by: test.median
## Arguments:
## Note: Defined as: mean((response - truth)^2)

rdesc = makeResampleDesc("CV", iters = 3)
r = resample("regr.lm", bh.task, rdesc, measures = list(mse, mseTestMedian, mseTestMin, mseTestMax))
## Resampling: cross-validation
## Measures:             mse       mse       mse       mse
## [Resample] iter 1:    24.078202624.078202624.078202624.0782026
## [Resample] iter 2:    29.498307729.498307729.498307729.4983077
## [Resample] iter 3:    18.689471818.689471818.689471818.6894718
##
## Aggregated Result: mse.test.mean=24.0886607,mse.test.median=24.0782026,mse.test.min=18.6894718,mse.test.max=29.4983077
##

r
## Resample Result
## Task: BostonHousing-example
## Learner: regr.lm
## Aggr perf: mse.test.mean=24.0886607,mse.test.median=24.0782026,mse.test.min=18.6894718,mse.test.max=29.4983077
## Runtime: 0.0288048

r$aggr
##   mse.test.mean mse.test.median    mse.test.min    mse.test.max
##        24.08866        24.07820        18.68947        29.49831

Example: Calculating the training error

Below we calculate the mean misclassification error (mmce{target="_blank"}) on the training and the test data sets. Note that we have to set predict = "both" when calling makeResampleDesc() in order to get predictions on both training and test sets.

mmceTrainMean = setAggregation(mmce, train.mean)
rdesc = makeResampleDesc("CV", iters = 3, predict = "both")
r = resample("classif.rpart", iris.task, rdesc, measures = list(mmce, mmceTrainMean))

r$measures.train

r$aggr

Example: Bootstrap

In out-of-bag bootstrap estimation $B$ new data sets $D^{1}, \ldots, D^{B}$ are drawn from the data set $D$ with replacement, each of the same size as $D$. In the $b$-th iteration, $D^{b}$ forms the training set, while the remaining elements from $D$, i.e., $D \setminus D^{b}$, form the test set.

The b632 and b632+ variants calculate a convex combination of the training performance and the out-of-bag bootstrap performance and thus require predictions on the training sets and an appropriate aggregation strategy.

# Use bootstrap as resampling strategy and predict on both train and test sets
rdesc = makeResampleDesc("Bootstrap", predict = "both", iters = 10)

# Set aggregation schemes for b632 and b632+ bootstrap
mmceB632 = setAggregation(mmce, b632)
mmceB632plus = setAggregation(mmce, b632plus)

mmceB632

r = resample("classif.rpart", iris.task, rdesc, measures = list(mmce, mmceB632, mmceB632plus),
  show.info = FALSE)
head(r$measures.train)

# Compare misclassification rates for out-of-bag, b632, and b632+ bootstrap
r$aggr

Convenience functions

The functionality described on this page allows for much control and flexibility. However, when quickly trying out some learners, it can get tedious to type all the code for defining the resampling strategy, setting the aggregation scheme and so on. As mentioned above, mlr includes some pre-defined resample description objects for frequently used strategies like, e.g., 5-fold cross-validation (cv5 (makeResampleDesc())). Moreover, mlr provides special functions for the most common resampling methods, for example holdout (resample()), crossval (resample()), or bootstrapB632 (resample()).

test1 = crossval("classif.lda", iris.task, iters = 3, measures = list(mmce, ber))

test2 = bootstrapB632plus("regr.lm", bh.task, iters = 3, measures = list(mse, mae))
crossval("classif.lda", iris.task, iters = 3, measures = list(mmce, ber))
## Resampling: cross-validation
## Measures:             mmce      ber
## [Resample] iter 1:    0.0200000 0.0238095
## [Resample] iter 2:    0.0400000 0.0370370
## [Resample] iter 3:    0.0000000 0.0000000
##
## Aggregated Result: mmce.test.mean=0.0200000,ber.test.mean=0.0202822
##
## Resample Result
## Task: iris-example
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.0200000,ber.test.mean=0.0202822
## Runtime: 0.0205457

bootstrapB632plus("regr.lm", bh.task, iters = 3, measures = list(mse, mae))
## Resampling: OOB bootstrapping
## Measures:             mse.train   mae.train   mse.test    mae.test
## [Resample] iter 1:    24.6425511  3.4107320   16.3415466  3.0123263
## [Resample] iter 2:    17.0963191  2.9809210   29.6056968  3.7131236
## [Resample] iter 3:    23.1440608  3.5079975   24.4183753  3.3467443
##
## Aggregated Result: mse.b632plus=22.9359054,mae.b632plus=3.3459751
##
## Resample Result
## Task: BostonHousing-example
## Learner: regr.lm
## Aggr perf: mse.b632plus=22.9359054,mae.b632plus=3.3459751
## Runtime: 0.0419419


mlr-org/mlr documentation built on Jan. 12, 2023, 5:16 a.m.