Home

/

CRAN

/

mlr3resampling

/

ResamplingSameOtherSizesCV: Resampling for comparing train subsets and sizes

ResamplingSameOtherSizesCV: Resampling for comparing train subsets and sizes
In mlr3resampling: Resampling Algorithms for 'mlr3' Framework

ResamplingSameOtherSizesCV

R Documentation

Resampling for comparing train subsets and sizes

Description

ResamplingSameOtherSizesCV defines how a task is partitioned for resampling, for example in resample() or benchmark().

Resampling objects can be instantiated on a Task, which can use the subset role.

After instantiation, sets can be accessed via ⁠$train_set(i)⁠ and ⁠$test_set(i)⁠, respectively.

Details

This is an implementation of SOAK, Same/Other/All K-fold cross-validation. A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. If each data point belongs to a subset (such as geographic region, year, etc), then how do we know if it is possible to train on one subset, and predict accurately on another subset? Cross-validation can be used to determine the extent to which this is possible, by first assigning fold IDs from 1 to K to all data (possibly using stratification, usually by subset and label). Then we loop over test sets (subset/fold combinations), train sets (same subset, other subsets, all subsets), and compute test/prediction accuracy for each combination. Comparing test/prediction accuracy between same and other, we can determine the extent to which it is possible (perfect if same/other have similar test accuracy for each subset; other is usually somewhat less accurate than same; other can be just as bad as featureless baseline when the subsets have different patterns).

This class has more parameters/potential applications than ResamplingSameOtherCV and ResamplingVariableSizeTrainCV, which are older and should only be preferred for visualization purposes.

Stratification

ResamplingSameOtherSizesCV supports stratified sampling. The stratification variables are assumed to be discrete, and must be stored in the Task with column role "stratum". In case of multiple stratification variables, each combination of the values of the stratification variables forms a stratum.

Grouping

ResamplingSameOtherSizesCV supports grouping of observations that will not be split in cross-validation. The grouping variable is assumed to be discrete, and must be stored in the Task with column role "group".

Subsets

ResamplingSameOtherSizesCV supports training on different subsets of observations. The subset variable is assumed to be discrete, and must be stored in the Task with column role "subset".

Parameters

The number of cross-validation folds K should be defined as the fold parameter, default 3.

The number of random seeds for down-sampling should be defined as the seeds parameter, default 1.

The ratio for down-sampling should be defined as the ratio parameter, default 0.5. The min size of same and other sets is repeatedly multiplied by this ratio, to obtain smaller sample sizes.

The number of down-sampling sizes/multiplications should be defined as the sizes parameter, which can also take two special values: default -1 means no down-sampling at all, and 0 means only down-sampling to the sizes of the same/other sets.

The ignore_subset parameter should be either TRUE or FALSE (default), whether to ignore the subset role. TRUE only creates splits for same subset (even if task defines subset role), and is useful for subtrain/validation splits (hyper-parameter learning). Note that this feature will work on a task with both stratum and group roles (unlike ResamplingCV).

The subsets parameter should specify the train subsets of interest: "S" (same), "O" (other), "A" (all), "SO", "SA", "SOA" (default).

In each subset, there will be about an equal number of observations assigned to each of the K folds. The train/test splits are defined by all possible combinations of test subset, test fold, train subsets (same/other/all), down-sampling sizes, and random seeds. The splits are stored in ⁠$instance$iteration.dt⁠.

Methods

Method `new()`

Creates a new instance of this R6 class.

Usage

Resampling$new(
  id,
  param_set = ps(),
  duplicated_ids = FALSE,
  label = NA_character_,
  man = NA_character_
)

Arguments

id: (character(1))
Identifier for the new instance.
param_set: (paradox::ParamSet)
Set of hyperparameters.
duplicated_ids: (logical(1))
Set to TRUE if this resampling strategy may have duplicated row ids in a single training set or test set.
label: (character(1))
Label for the new instance.
man: (character(1))
String in the format ⁠[pkg]::[topic]⁠ pointing to a manual page for this object. The referenced help package can be opened via method ⁠$help()⁠.

Method `train_set()`

Returns the row ids of the i-th training set.

Usage

Resampling$train_set(i)

Arguments

i: (integer(1))
Iteration.

Returns

(integer()) of row ids.

Method `test_set()`

Returns the row ids of the i-th test set.

Usage

Resampling$test_set(i)

Arguments

i: (integer(1))
Iteration.

Returns

(integer()) of row ids.

Examples

same_other_sizes <- mlr3resampling::ResamplingSameOtherSizesCV$new()
same_other_sizes$param_set$values$folds <- 5

mlr3resampling documentation built on Nov. 21, 2025, 1:07 a.m.

mlr3resampling index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlr3resampling
Resampling Algorithms for 'mlr3' Framework

ResamplingSameOtherSizesCV: Resampling for comparing train subsets and sizes
In mlr3resampling: Resampling Algorithms for 'mlr3' Framework

Resampling for comparing train subsets and sizes

Description

Details

Stratification

Grouping

Subsets

Parameters

Methods

Public methods

Method `new()`

Usage

Arguments

Method `train_set()`

Usage

Arguments

Returns

Method `test_set()`

Usage

Arguments

Returns

See Also

Examples

Related to ResamplingSameOtherSizesCV in mlr3resampling...

R Package Documentation

Browse R Packages

We want your feedback!

mlr3resampling Resampling Algorithms for 'mlr3' Framework

ResamplingSameOtherSizesCV: Resampling for comparing train subsets and sizes In mlr3resampling: Resampling Algorithms for 'mlr3' Framework

Resampling for comparing train subsets and sizes

Description

Details

Stratification

Grouping

Subsets

Parameters

Methods

Public methods

Method new()

Usage

Arguments

Method train_set()

Usage

Arguments

Returns

Method test_set()

Usage

Arguments

Returns

See Also

Examples

Related to ResamplingSameOtherSizesCV in mlr3resampling...

R Package Documentation

Browse R Packages

We want your feedback!

mlr3resampling
Resampling Algorithms for 'mlr3' Framework

ResamplingSameOtherSizesCV: Resampling for comparing train subsets and sizes
In mlr3resampling: Resampling Algorithms for 'mlr3' Framework

Method `new()`

Method `train_set()`

Method `test_set()`