train_cv: Tune, Train, and Test an 'rtemis' Learner by Nested...
In egenn/rtemis: Machine Learning and Visualization

train_cv

R Documentation

Tune, Train, and Test an rtemis Learner by Nested Resampling

Description

train is a high-level function to tune, train, and test an rtemis model by nested resampling, with optional preprocessing and decomposition of input features

Usage

train_cv(
  x,
  y = NULL,
  alg = "ranger",
  train.params = list(),
  .preprocess = NULL,
  .decompose = NULL,
  weights = NULL,
  n.repeats = 1,
  outer.resampling = setup.resample(resampler = "strat.sub", n.resamples = 10),
  inner.resampling = setup.resample(resampler = "kfold", n.resamples = 5),
  bag.fn = median,
  x.name = NULL,
  y.name = NULL,
  save.mods = TRUE,
  save.tune = TRUE,
  bag.fitted = FALSE,
  outer.n.workers = 1,
  print.plot = FALSE,
  plot.fitted = FALSE,
  plot.predicted = TRUE,
  plot.theme = rtTheme,
  print.res.plot = FALSE,
  question = NULL,
  verbose = TRUE,
  res.verbose = FALSE,
  trace = 0,
  headless = FALSE,
  outdir = NULL,
  save.plots = FALSE,
  save.rt = ifelse(!is.null(outdir), TRUE, FALSE),
  save.mod = TRUE,
  save.res = FALSE,
  debug = FALSE,
  ...
)

Arguments

`x`	Numeric vector or matrix / data frame of features i.e. independent variables
`y`	Numeric vector of outcome, i.e. dependent variable
`alg`	Character: Learner to use. Options: select_learn
`train.params`	Optional named list of parameters to be passed to `alg`. All parameters can be passed as part of `...` as well
`.preprocess`	Optional named list of parameters to be passed to preprocess. Set using setup.preprocess, e.g. `decom = setup.preprocess(impute = TRUE)`
`.decompose`	Optional named list of parameters to be used for decomposition / dimensionality reduction. Set using setup.decompose, e.g. `decom = setup.decompose("ica", 12)`
`weights`	Numeric vector: Weights for cases. For classification, `weights` takes precedence over `ifw`, therefore set `weights = NULL` if using `ifw`. Note: If `weight` are provided, `ifw` is not used. Leave NULL if setting `ifw = TRUE`.
`n.repeats`	Integer: Number of times to repeat the outer resampling. This was added for completeness, but in practice we use either k-fold crossvalidation, e.g. 10-fold, especially in large samples, or a higher number of stratified subsamples, e.g. 25, for smaller samples
`outer.resampling`	List: Output of setup.resample to define outer resampling scheme
`inner.resampling`	List: Output of setup.resample to define inner resampling scheme
`bag.fn`	Function to use to average prediction if `bag.fitted = TRUE`. Default = `median`
`x.name`	Character: Name of predictor dataset
`y.name`	Character: Name of outcome
`save.mods`	Logical: If TRUE, retain trained models in object, otherwise discard (save space if running many resamples).
`save.tune`	Logical: If TRUE, save the best.tune data frame for each resample (output of gridSearchLearn)
`bag.fitted`	Logical: If TRUE, use all models to also get a bagged prediction on the full sample. To get a bagged prediction on new data using the same models, use predict.rtModCV
`outer.n.workers`	Integer: Number of cores to use for the outer i.e. testing resamples. You are likely parallelizing either in the inner (tuning) or the learner itself is parallelized. Don't parallelize the parallelization
`print.plot`	Logical: if TRUE, produce plot using `mplot3` Takes precedence over `plot.fitted` and `plot.predicted`.
`plot.fitted`	Logical: if TRUE, plot True (y) vs Fitted
`plot.predicted`	Logical: if TRUE, plot True (y.test) vs Predicted. Requires `x.test` and `y.test`
`plot.theme`	Character: "zero", "dark", "box", "darkbox"
`print.res.plot`	Logical: Print model performance plot for each resample. from all resamples. Defaults to TRUE
`question`	Character: the question you are attempting to answer with this model, in plain language.
`verbose`	Logical: If TRUE, print summary to screen.
`res.verbose`	Logical: Passed to each individual learner's `verbose` argument
`trace`	Integer: (Not really used) Print additional information if > 0.
`headless`	Logical: If TRUE, turn off all plotting.
`outdir`	Character: Path where output should be saved
`save.plots`	Logical: If TRUE, save plots to outdir
`save.rt`	Logical: If TRUE and `outdir` is set, save all models to `outdir`
`save.mod`	Logical: If TRUE, save all output to an RDS file in `outdir` `save.mod` is TRUE by default if an `outdir` is defined. If set to TRUE, and no `outdir` is defined, outdir defaults to `paste0("./s.", mod.name)`
`save.res`	Logical: If TRUE, save the full output of each model trained on differents resamples under subdirectories of `outdir`
`debug`	Logical: If TRUE, sets `outer.n.workers` to 1, `options(error=recover)`, and options(warn = 2)
`...`	Additional train.params to be passed to learner. Will be concatenated with `train.params`

Details

Note on resampling: You should never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and testing sets of the inner resamples, leading to underestimated testing error.
If there is an error while running either the outer or inner resamples in parallel, the error message returned by R will likely be unhelpful. Repeat the command after setting both inner and outer resample run to use a single core, which should provide an informative message.

The train command is replacing elevate. Note: specifying id.strat for the inner resampling is not yet supported.

Value

Object of class rtModCV (Regression) or rtModCVClass (Classification)

`error.test.repeats`	the mean or aggregate error, as appropriate, for each repeat
`error.test.repeats.mean`	the mean error of all repeats, i.e. the mean of `error.test.repeats`
`error.test.repeats.sd`	if `n.repeats` > 1, the standard deviation of `error.test.repeats`
`error.test.res`	the error for each resample, for each repeat

Author(s)

E.D. Gennatas

Examples

## Not run: 
# Regression

x <- rnormmat(100, 50)
w <- rnorm(50)
y <- x %*% w + rnorm(50)
mod <- train(x, y)

# Classification

data(Sonar, package = "mlbench")
mod <- train(Sonar)

## End(Not run)

egenn/rtemis documentation built on Feb. 11, 2025, 5:17 a.m.