resample: Resampling methods

View source: R/resample.R

resampleR Documentation

Resampling methods

Description

Create resamples of your data, e.g. for model building or validation. "bootstrap" gives the standard bootstrap, i.e. random sampling with replacement, using bootstrap, "strat.sub" creates stratified subsamples using strat.sub, while "strat.boot" uses strat.boot which runs strat.sub and then randomly duplicates some of the training cases to reach original length of input (default) or length defined by target.length.

Usage

resample(
  y,
  n.resamples = 10,
  resampler = c("strat.sub", "strat.boot", "kfold", "bootstrap", "loocv"),
  index = NULL,
  group = NULL,
  stratify.var = y,
  train.p = 0.75,
  strat.n.bins = 4,
  target.length = NROW(y),
  id.strat = NULL,
  rtset = NULL,
  seed = NULL,
  verbosity = TRUE
)

Arguments

y

Vector or data.frame: Usually the outcome; NROW(y) defines sample size

n.resamples

Integer: Number of training/testing sets required

resampler

Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub".

index

List where each element is a vector of training set indices. Use this for manual/pre-defined train/test splits

group

Integer, vector, length = length(y): Integer vector, where numbers define fold membership. e.g. for 10-fold on a dataset with 1000 cases, you could use group = rep(1:10, each = 100)

stratify.var

Numeric vector (optional): Variable used for stratification.

train.p

Float (0, 1): Fraction of cases to assign to traininig set for resampler = "strat.sub"

strat.n.bins

Integer: Number of groups to use for stratification for resampler = "strat.sub" / "strat.boot"

target.length

Integer: Number of cases for training set for resampler = "strat.boot".

id.strat

Vector of IDs which may be replicated: resampling should force replicates of each ID to only appear in the training or testing.

rtset

List: Output of an setup.resample (or named list with same structure). NOTE: Overrides all other arguments. Default = NULL

seed

Integer: (Optional) Set seed for random number generator, in order to make output reproducible. See ?base::set.seed

verbosity

Logical: If TRUE, print messages to console

Details

resample is used by multiple rtemis learners, gridSearchLearn, and train_cv. Note that option 'kfold', which uses kfold results in resamples of slightly different length for y of small length, so avoid all operations which rely on equal-length vectors. For example, you can't place resamples in a data.frame, but must use a list instead.

Author(s)

E.D. Gennatas

See Also

train_cv

Examples

y <- rnorm(200)
# 10-fold (stratified)
res <- resample(y, 10, "kfold")
# 25 stratified subsamples
res <- resample(y, 25, "strat.sub")
# 100 stratified bootstraps
res <- resample(y, 100, "strat.boot")

egenn/rtemis documentation built on Oct. 28, 2024, 6:30 a.m.