train_cv | R Documentation |
train
is a high-level function to tune, train, and test an
rtemis model by nested resampling, with optional preprocessing and
decomposition of input features
train_cv(
x,
y = NULL,
alg = "ranger",
train.params = list(),
.preprocess = NULL,
.decompose = NULL,
weights = NULL,
n.repeats = 1,
outer.resampling = setup.resample(resampler = "strat.sub", n.resamples = 10),
inner.resampling = setup.resample(resampler = "kfold", n.resamples = 5),
bag.fn = median,
x.name = NULL,
y.name = NULL,
save.mods = TRUE,
save.tune = TRUE,
bag.fitted = FALSE,
outer.n.workers = 1,
print.plot = FALSE,
plot.fitted = FALSE,
plot.predicted = TRUE,
plot.theme = rtTheme,
print.res.plot = FALSE,
question = NULL,
verbose = TRUE,
res.verbose = FALSE,
trace = 0,
headless = FALSE,
outdir = NULL,
save.plots = FALSE,
save.rt = ifelse(!is.null(outdir), TRUE, FALSE),
save.mod = TRUE,
save.res = FALSE,
debug = FALSE,
...
)
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
alg |
Character: Learner to use. Options: select_learn |
train.params |
Optional named list of parameters to be passed to
|
.preprocess |
Optional named list of parameters to be passed to
preprocess. Set using setup.preprocess,
e.g. |
.decompose |
Optional named list of parameters to be used for
decomposition / dimensionality reduction. Set using setup.decompose,
e.g. |
weights |
Numeric vector: Weights for cases. For classification, |
n.repeats |
Integer: Number of times to repeat the outer resampling. This was added for completeness, but in practice we use either k-fold crossvalidation, e.g. 10-fold, especially in large samples, or a higher number of stratified subsamples, e.g. 25, for smaller samples |
outer.resampling |
List: Output of setup.resample to define outer resampling scheme |
inner.resampling |
List: Output of setup.resample to define inner resampling scheme |
bag.fn |
Function to use to average prediction if
|
x.name |
Character: Name of predictor dataset |
y.name |
Character: Name of outcome |
save.mods |
Logical: If TRUE, retain trained models in object, otherwise discard (save space if running many resamples). |
save.tune |
Logical: If TRUE, save the best.tune data frame for each resample (output of gridSearchLearn) |
bag.fitted |
Logical: If TRUE, use all models to also get a bagged prediction on the full sample. To get a bagged prediction on new data using the same models, use predict.rtModCV |
outer.n.workers |
Integer: Number of cores to use for the outer i.e. testing resamples. You are likely parallelizing either in the inner (tuning) or the learner itself is parallelized. Don't parallelize the parallelization |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
print.res.plot |
Logical: Print model performance plot for each resample. from all resamples. Defaults to TRUE |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
res.verbose |
Logical: Passed to each individual learner's |
trace |
Integer: (Not really used) Print additional information if > 0. |
headless |
Logical: If TRUE, turn off all plotting. |
outdir |
Character: Path where output should be saved |
save.plots |
Logical: If TRUE, save plots to outdir |
save.rt |
Logical: If TRUE and |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
save.res |
Logical: If TRUE, save the full output of each model trained
on differents resamples under subdirectories of |
debug |
Logical: If TRUE, sets |
... |
Additional train.params to be passed to learner. Will be concatenated
with |
Note on resampling: You should never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and testing sets of the inner resamples, leading to underestimated testing error.
If there is an error while running either the outer or inner resamples in parallel, the error message returned by R will likely be unhelpful. Repeat the command after setting both inner and outer resample run to use a single core, which should provide an informative message.
The train
command is replacing elevate
.
Note: specifying id.strat for the inner resampling is not yet supported.
Object of class rtModCV
(Regression) or
rtModCVClass
(Classification)
error.test.repeats |
the mean or aggregate error, as appropriate, for each repeat |
error.test.repeats.mean |
the mean error of all repeats, i.e. the mean of |
error.test.repeats.sd |
if |
error.test.res |
the error for each resample, for each repeat |
E.D. Gennatas
## Not run:
# Regression
x <- rnormmat(100, 50)
w <- rnorm(50)
y <- x %*% w + rnorm(50)
mod <- train(x, y)
# Classification
data(Sonar, package = "mlbench")
mod <- train(Sonar)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.