resample: Resample a Learner on a Task
In mlr3: Machine Learning in R - Next Generation

resample

R Documentation

Resample a Learner on a Task

Description

Runs a resampling (possibly in parallel): Repeatedly apply Learner learner on a training set of Task task to train a model, then use the trained model to predict observations of a test set. Training and test sets are defined by the Resampling resampling.

Usage

resample(
  task,
  learner,
  resampling,
  store_models = FALSE,
  store_backends = TRUE,
  encapsulate = NA_character_,
  allow_hotstart = FALSE,
  clone = c("task", "learner", "resampling"),
  unmarshal = TRUE,
  callbacks = NULL
)

Arguments

`task`	(Task).
`learner`	(Learner).
`resampling`	(Resampling).
`store_models`	(`logical(1)`) Store the fitted model in the resulting object= Set to `TRUE` if you want to further analyse the models or want to extract information like variable importance.
`store_backends`	(`logical(1)`) Keep the DataBackend of the Task in the ResampleResult? Set to `TRUE` if your performance measures require a Task, or to analyse results more conveniently. Set to `FALSE` to reduce the file size and memory footprint after serialization. The current default is `TRUE`, but this eventually will be changed in a future release.
`encapsulate`	(`character(1)`) If not `NA`, enables encapsulation by setting the field `Learner$encapsulate` to one of the supported values: `"none"` (disable encapsulation), `"try"` (captures errors but output is printed to the console and not logged), `"evaluate"` (execute via evaluate) and `"callr"` (start in external session via callr). If `NA`, encapsulation is not changed, i.e. the settings of the individual learner are active. Additionally, if encapsulation is set to `"evaluate"` or `"callr"`, the fallback learner is set to the featureless learner if the learner does not already have a fallback configured.
`allow_hotstart`	(`logical(1)`) Determines if learner(s) are hot started with trained models in `⁠$hotstart_stack⁠`. See also HotstartStack.
`clone`	(`character()`) Select the input objects to be cloned before proceeding by providing a set with possible values `"task"`, `"learner"` and `"resampling"` for Task, Learner and Resampling, respectively. Per default, all input objects are cloned.
`unmarshal`	`Learner` Whether to unmarshal learners that were marshaled during the execution. If `TRUE` all models are stored in unmarshaled form. If `FALSE`, all learners (that need marshaling) are stored in marshaled form.
`callbacks`	(List of mlr3misc::Callback) Callbacks to be executed during the resampling process. See CallbackResample and ContextResample for details.

Value

ResampleResult.

Predict Sets

If you want to compare the performance of a learner on the training with the performance on the test set, you have to configure the Learner to predict on multiple sets by setting the field predict_sets to c("train", "test") (default is "test"). Each set yields a separate Prediction object during resampling. In the next step, you have to configure the measures to operate on the respective Prediction object:

m1 = msr("classif.ce", id = "ce.train", predict_sets = "train")
m2 = msr("classif.ce", id = "ce.test", predict_sets = "test")

The (list of) created measures can finally be passed to ⁠$aggregate()⁠ or ⁠$score()⁠.

Parallelization

This function can be parallelized with the future package. One job is one resampling iteration, and all jobs are send to an apply function from future.apply in a single batch. To select a parallel backend, use future::plan(). More on parallelization can be found in the book: https://mlr3book.mlr-org.com/chapters/chapter10/advanced_technical_aspects_of_mlr3.html

Progress Bars

This function supports progress bars via the package progressr. Simply wrap the function call in progressr::with_progress() to enable them. Alternatively, call progressr::handlers() with global = TRUE to enable progress bars globally. We recommend the progress package as backend which can be enabled with progressr::handlers("progress").

Logging

The mlr3 uses the lgr package for logging. lgr supports multiple log levels which can be queried with getOption("lgr.log_levels").

To suppress output and reduce verbosity, you can lower the log from the default level "info" to "warn":

lgr::get_logger("mlr3")$set_threshold("warn")

To get additional log output for debugging, increase the log level to "debug" or "trace":

lgr::get_logger("mlr3")$set_threshold("debug")

To log to a file or a data base, see the documentation of lgr::lgr-package.

Note

The fitted models are discarded after the predictions have been computed in order to reduce memory consumption. If you need access to the models for later analysis, set store_models to TRUE.

Examples

task = tsk("penguins")
learner = lrn("classif.rpart")
resampling = rsmp("cv")

# Explicitly instantiate the resampling for this task for reproduciblity
set.seed(123)
resampling$instantiate(task)

rr = resample(task, learner, resampling)
print(rr)

# Retrieve performance
rr$score(msr("classif.ce"))
rr$aggregate(msr("classif.ce"))

# merged prediction objects of all resampling iterations
pred = rr$prediction()
pred$confusion

# Repeat resampling with featureless learner
rr_featureless = resample(task, lrn("classif.featureless"), resampling)

# Convert results to BenchmarkResult, then combine them
bmr1 = as_benchmark_result(rr)
bmr2 = as_benchmark_result(rr_featureless)
print(bmr1$combine(bmr2))

mlr3 documentation built on April 4, 2025, 5:08 a.m.