resample | R Documentation |
Create resamples of your data, e.g. for model building or validation.
"bootstrap" gives the standard bootstrap, i.e. random sampling with replacement, using
bootstrap, "strat.sub" creates stratified subsamples using strat.sub,
while "strat.boot" uses strat.boot which runs strat.sub and then
randomly duplicates some of the training cases to reach original length of input
(default) or length defined by target.length
.
resample(
y,
n.resamples = 10,
resampler = c("strat.sub", "strat.boot", "kfold", "bootstrap", "loocv"),
index = NULL,
group = NULL,
stratify.var = y,
train.p = 0.75,
strat.n.bins = 4,
target.length = NROW(y),
id.strat = NULL,
rtset = NULL,
seed = NULL,
verbosity = TRUE
)
y |
Vector or data.frame: Usually the outcome; |
n.resamples |
Integer: Number of training/testing sets required |
resampler |
Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". |
index |
List where each element is a vector of training set indices. Use this for manual/pre-defined train/test splits |
group |
Integer, vector, length = |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
id.strat |
Vector of IDs which may be replicated: resampling should force replicates of each ID to only appear in the training or testing. |
rtset |
List: Output of an setup.resample (or named list with same structure). NOTE: Overrides all other arguments. Default = NULL |
seed |
Integer: (Optional) Set seed for random number generator, in order to make
output reproducible. See |
verbosity |
Logical: If TRUE, print messages to console |
resample
is used by multiple rtemis learners, gridSearchLearn
, and
train_cv. Note that option 'kfold', which uses kfold results in resamples
of slightly different length for y of small length, so avoid all operations which rely
on equal-length vectors. For example, you can't place resamples in a data.frame, but
must use a list instead.
E.D. Gennatas
train_cv
y <- rnorm(200)
# 10-fold (stratified)
res <- resample(y, 10, "kfold")
# 25 stratified subsamples
res <- resample(y, 25, "strat.sub")
# 100 stratified bootstraps
res <- resample(y, 100, "strat.boot")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.