benchmark: Benchmark Multiple Learners on Multiple Tasks
In mlr3: Machine Learning in R - Next Generation

benchmark

R Documentation

Benchmark Multiple Learners on Multiple Tasks

Description

Runs a benchmark on arbitrary combinations of tasks (Task), learners (Learner), and resampling strategies (Resampling), possibly in parallel.

For large-scale benchmarking we recommend to use the mlr3batchmark package. This package runs benchmark experiments on high-performance computing clusters and handles failed experiments.

Usage

benchmark(
  design,
  store_models = FALSE,
  store_backends = TRUE,
  encapsulate = NA_character_,
  allow_hotstart = FALSE,
  clone = c("task", "learner", "resampling"),
  unmarshal = TRUE,
  callbacks = NULL
)

Arguments

`design`	(`data.frame()`) Data frame (or `data.table::data.table()`) with three columns: "task", "learner", and "resampling". Each row defines a resampling by providing a Task, Learner and an instantiated Resampling strategy. The helper function `benchmark_grid()` can assist in generating an exhaustive design (see examples) and instantiate the Resamplings per Task. Additionally, you can set the additional column 'param_values', see `benchmark_grid()`.
`store_models`	(`logical(1)`) Store the fitted model in the resulting object= Set to `TRUE` if you want to further analyse the models or want to extract information like variable importance.
`store_backends`	(`logical(1)`) Keep the DataBackend of the Task in the ResampleResult? Set to `TRUE` if your performance measures require a Task, or to analyse results more conveniently. Set to `FALSE` to reduce the file size and memory footprint after serialization. The current default is `TRUE`, but this eventually will be changed in a future release.
`encapsulate`	(`character(1)`) If not `NA`, enables encapsulation by setting the field `Learner$encapsulate` to one of the supported values: `"none"` (disable encapsulation), `"try"` (captures errors but output is printed to the console and not logged), `"evaluate"` (execute via evaluate) and `"callr"` (start in external session via callr). If `NA`, encapsulation is not changed, i.e. the settings of the individual learner are active. Additionally, if encapsulation is set to `"evaluate"` or `"callr"`, the fallback learner is set to the featureless learner if the learner does not already have a fallback configured.
`allow_hotstart`	(`logical(1)`) Determines if learner(s) are hot started with trained models in `⁠$hotstart_stack⁠`. See also HotstartStack.
`clone`	(`character()`) Select the input objects to be cloned before proceeding by providing a set with possible values `"task"`, `"learner"` and `"resampling"` for Task, Learner and Resampling, respectively. Per default, all input objects are cloned.
`unmarshal`	`Learner` Whether to unmarshal learners that were marshaled during the execution. If `TRUE` all models are stored in unmarshaled form. If `FALSE`, all learners (that need marshaling) are stored in marshaled form.
`callbacks`	(List of mlr3misc::Callback) Callbacks to be executed during the resampling process. See CallbackResample and ContextResample for details.

Value

BenchmarkResult.

Stochasticity

Note that uninstantiated Resamplings are instantiated on the task, making the function stochastic even in case of deterministic learners.

Predict Sets

If you want to compare the performance of a learner on the training with the performance on the test set, you have to configure the Learner to predict on multiple sets by setting the field predict_sets to c("train", "test") (default is "test"). Each set yields a separate Prediction object during resampling. In the next step, you have to configure the measures to operate on the respective Prediction object:

m1 = msr("classif.ce", id = "ce.train", predict_sets = "train")
m2 = msr("classif.ce", id = "ce.test", predict_sets = "test")

The (list of) created measures can finally be passed to ⁠$aggregate()⁠ or ⁠$score()⁠.

Parallelization

This function can be parallelized with the future or mirai package. One job is one resampling iteration. All jobs are send to an apply function from future.apply or mirai::mirai_map() in a single batch. To select a parallel backend, use future::plan(). To use mirai, call mirai::daemons(.compute = "mlr3_parallelization") before calling this function. The future package guarantees reproducible results independent of the parallel backend. The results of mirai will not be the same but can be made reproducible by setting a seed when calling mirai::daemons(). More on parallelization can be found in the book: https://mlr3book.mlr-org.com/chapters/chapter10/advanced_technical_aspects_of_mlr3.html

Progress Bars

This function supports progress bars via the package progressr. Simply wrap the function call in progressr::with_progress() to enable them. Alternatively, call progressr::handlers() with global = TRUE to enable progress bars globally. We recommend the progress package as backend which can be enabled with progressr::handlers("progress").

Logging

The mlr3 uses the lgr package for logging. lgr supports multiple log levels which can be queried with getOption("lgr.log_levels").

To suppress output and reduce verbosity, you can lower the log from the default level "info" to "warn":

lgr::get_logger("mlr3")$set_threshold("warn")

To get additional log output for debugging, increase the log level to "debug" or "trace":

lgr::get_logger("mlr3")$set_threshold("debug")

To log to a file or a data base, see the documentation of lgr::lgr-package.

Note

The fitted models are discarded after the predictions have been scored in order to reduce memory consumption. If you need access to the models for later analysis, set store_models to TRUE.

Examples

# benchmarking with benchmark_grid()
tasks = lapply(c("penguins", "sonar"), tsk)
learners = lapply(c("classif.featureless", "classif.rpart"), lrn)
resamplings = rsmp("cv", folds = 3)

design = benchmark_grid(tasks, learners, resamplings)
print(design)

set.seed(123)
bmr = benchmark(design)

## Data of all resamplings
head(as.data.table(bmr))

## Aggregated performance values
aggr = bmr$aggregate()
print(aggr)

## Extract predictions of first resampling result
rr = aggr$resample_result[[1]]
as.data.table(rr$prediction())

# Benchmarking with a custom design:
# - fit classif.featureless on penguins with a 3-fold CV
# - fit classif.rpart on sonar using a holdout
tasks = list(tsk("penguins"), tsk("sonar"))
learners = list(lrn("classif.featureless"), lrn("classif.rpart"))
resamplings = list(rsmp("cv", folds = 3), rsmp("holdout"))

design = data.table::data.table(
  task = tasks,
  learner = learners,
  resampling = resamplings
)

## Instantiate resamplings
design$resampling = Map(
  function(task, resampling) resampling$clone()$instantiate(task),
  task = design$task, resampling = design$resampling
)

## Run benchmark
bmr = benchmark(design)
print(bmr)

## Get the training set of the 2nd iteration of the featureless learner on penguins
rr = bmr$aggregate()[learner_id == "classif.featureless"]$resample_result[[1]]
rr$resampling$train_set(2)

mlr3 documentation built on Dec. 4, 2025, 1:06 a.m.

mlr3 index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlr3
Machine Learning in R - Next Generation

benchmark: Benchmark Multiple Learners on Multiple Tasks
In mlr3: Machine Learning in R - Next Generation

Benchmark Multiple Learners on Multiple Tasks

Description

Usage

Arguments

Value

Stochasticity

Predict Sets

Parallelization

Progress Bars

Logging

Note

See Also

Examples

Related to benchmark in mlr3...

R Package Documentation

Browse R Packages

We want your feedback!

mlr3 Machine Learning in R - Next Generation

benchmark: Benchmark Multiple Learners on Multiple Tasks In mlr3: Machine Learning in R - Next Generation

Benchmark Multiple Learners on Multiple Tasks

Description

Usage

Arguments

Value

Stochasticity

Predict Sets

Parallelization

Progress Bars

Logging

Note

See Also

Examples

Related to benchmark in mlr3...

R Package Documentation

Browse R Packages

We want your feedback!

mlr3
Machine Learning in R - Next Generation

benchmark: Benchmark Multiple Learners on Multiple Tasks
In mlr3: Machine Learning in R - Next Generation