train_cv: Tune, Train, and Test an 'rtemis' Learner by Nested...

View source: R/train_cv.R

train_cvR Documentation

Tune, Train, and Test an rtemis Learner by Nested Resampling

Description

train is a high-level function to tune, train, and test an rtemis model by nested resampling, with optional preprocessing and decomposition of input features

Usage

train_cv(
  x,
  y = NULL,
  alg = "ranger",
  train.params = list(),
  .preprocess = NULL,
  .decompose = NULL,
  weights = NULL,
  n.repeats = 1,
  outer.resampling = setup.resample(resampler = "strat.sub", n.resamples = 10),
  inner.resampling = setup.resample(resampler = "kfold", n.resamples = 5),
  bag.fn = median,
  x.name = NULL,
  y.name = NULL,
  save.mods = TRUE,
  save.tune = TRUE,
  bag.fitted = FALSE,
  outer.n.workers = 1,
  print.plot = FALSE,
  plot.fitted = FALSE,
  plot.predicted = TRUE,
  plot.theme = rtTheme,
  print.res.plot = FALSE,
  question = NULL,
  verbose = TRUE,
  res.verbose = FALSE,
  trace = 0,
  headless = FALSE,
  outdir = NULL,
  save.plots = FALSE,
  save.rt = ifelse(!is.null(outdir), TRUE, FALSE),
  save.mod = TRUE,
  save.res = FALSE,
  debug = FALSE,
  ...
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

alg

Character: Learner to use. Options: select_learn

train.params

Optional named list of parameters to be passed to alg. All parameters can be passed as part of ... as well

.preprocess

Optional named list of parameters to be passed to preprocess. Set using setup.preprocess, e.g. decom = setup.preprocess(impute = TRUE)

.decompose

Optional named list of parameters to be used for decomposition / dimensionality reduction. Set using setup.decompose, e.g. decom = setup.decompose("ica", 12)

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

n.repeats

Integer: Number of times to repeat the outer resampling. This was added for completeness, but in practice we use either k-fold crossvalidation, e.g. 10-fold, especially in large samples, or a higher number of stratified subsamples, e.g. 25, for smaller samples

outer.resampling

List: Output of setup.resample to define outer resampling scheme

inner.resampling

List: Output of setup.resample to define inner resampling scheme

bag.fn

Function to use to average prediction if bag.fitted = TRUE. Default = median

x.name

Character: Name of predictor dataset

y.name

Character: Name of outcome

save.mods

Logical: If TRUE, retain trained models in object, otherwise discard (save space if running many resamples).

save.tune

Logical: If TRUE, save the best.tune data frame for each resample (output of gridSearchLearn)

bag.fitted

Logical: If TRUE, use all models to also get a bagged prediction on the full sample. To get a bagged prediction on new data using the same models, use predict.rtModCV

outer.n.workers

Integer: Number of cores to use for the outer i.e. testing resamples. You are likely parallelizing either in the inner (tuning) or the learner itself is parallelized. Don't parallelize the parallelization

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

print.res.plot

Logical: Print model performance plot for each resample. from all resamples. Defaults to TRUE

question

Character: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

res.verbose

Logical: Passed to each individual learner's verbose argument

trace

Integer: (Not really used) Print additional information if > 0.

headless

Logical: If TRUE, turn off all plotting.

outdir

Character: Path where output should be saved

save.plots

Logical: If TRUE, save plots to outdir

save.rt

Logical: If TRUE and outdir is set, save all models to outdir

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

save.res

Logical: If TRUE, save the full output of each model trained on differents resamples under subdirectories of outdir

debug

Logical: If TRUE, sets outer.n.workers to 1, options(error=recover), and options(warn = 2)

...

Additional train.params to be passed to learner. Will be concatenated with train.params

Details

  • Note on resampling: You should never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and testing sets of the inner resamples, leading to underestimated testing error.

  • If there is an error while running either the outer or inner resamples in parallel, the error message returned by R will likely be unhelpful. Repeat the command after setting both inner and outer resample run to use a single core, which should provide an informative message.

The train command is replacing elevate. Note: specifying id.strat for the inner resampling is not yet supported.

Value

Object of class rtModCV (Regression) or rtModCVClass (Classification)

error.test.repeats

the mean or aggregate error, as appropriate, for each repeat

error.test.repeats.mean

the mean error of all repeats, i.e. the mean of error.test.repeats

error.test.repeats.sd

if n.repeats > 1, the standard deviation of error.test.repeats

error.test.res

the error for each resample, for each repeat

Author(s)

E.D. Gennatas

Examples

## Not run: 
# Regression

x <- rnormmat(100, 50)
w <- rnorm(50)
y <- x %*% w + rnorm(50)
mod <- train(x, y)

# Classification

data(Sonar, package = "mlbench")
mod <- train(Sonar)

## End(Not run)

egenn/rtemis documentation built on May 4, 2024, 7:40 p.m.