ResamplingVariableSizeTrainCV | R Documentation |
ResamplingVariableSizeTrainCV
defines how a task is partitioned for
resampling, for example in
resample()
or
benchmark()
.
Resampling objects can be instantiated on a
Task
.
After instantiation, sets can be accessed via
$train_set(i)
and
$test_set(i)
, respectively.
A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. How many train samples are required to get accurate predictions on a test set? Cross-validation can be used to answer this question, with variable size train sets.
ResamplingVariableSizeTrainCV
supports stratified sampling.
The stratification variables are assumed to be discrete,
and must be stored in the Task with column role "stratum"
.
In case of multiple stratification variables,
each combination of the values of the stratification variables forms a stratum.
ResamplingVariableSizeTrainCV
does not support grouping of observations.
The number of cross-validation folds should be defined as the
fold
parameter.
For each fold ID, the corresponding observations are considered the test set, and a variable number of other observations are considered the train set.
The random_seeds
parameter controls the number of random
orderings of the train set that are considered.
For each random order of the train set, the min_train_data
parameter controls the size of the smallest stratum in the smallest
train set considered.
To determine the other train set sizes, we use an equally spaced grid
on the log scale, from min_train_data
to the largest train set
size (all data not in test set). The
number of train set sizes in this grid is determined by the
train_sizes
parameter.
new()
Creates a new instance of this R6 class.
Resampling$new( id, param_set = ps(), duplicated_ids = FALSE, label = NA_character_, man = NA_character_ )
id
(character(1)
)
Identifier for the new instance.
param_set
(paradox::ParamSet)
Set of hyperparameters.
duplicated_ids
(logical(1)
)
Set to TRUE
if this resampling strategy may have duplicated row ids in a single training set or test set.
label
(character(1)
)
Label for the new instance.
man
(character(1)
)
String in the format [pkg]::[topic]
pointing to a manual page for this object.
The referenced help package can be opened via method $help()
.
train_set()
Returns the row ids of the i-th training set.
Resampling$train_set(i)
i
(integer(1)
)
Iteration.
(integer()
) of row ids.
test_set()
Returns the row ids of the i-th test set.
Resampling$test_set(i)
i
(integer(1)
)
Iteration.
(integer()
) of row ids.
(var_sizes <- mlr3resampling::ResamplingVariableSizeTrainCV$new())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.