benchmark | R Documentation |
Runs a benchmark on arbitrary combinations of tasks (Task), learners (Learner), and resampling strategies (Resampling), possibly in parallel.
For large-scale benchmarking we recommend to use the mlr3batchmark package. This package runs benchmark experiments on high-performance computing clusters and handles failed experiments.
benchmark(
design,
store_models = FALSE,
store_backends = TRUE,
encapsulate = NA_character_,
allow_hotstart = FALSE,
clone = c("task", "learner", "resampling"),
unmarshal = TRUE,
callbacks = NULL
)
design |
( |
store_models |
( |
store_backends |
( |
encapsulate |
( |
allow_hotstart |
( |
clone |
( |
unmarshal |
|
callbacks |
(List of mlr3misc::Callback) |
BenchmarkResult.
Note that uninstantiated Resampling
s are instantiated on the task, making
the function stochastic even in case of deterministic learners.
If you want to compare the performance of a learner on the training with the performance
on the test set, you have to configure the Learner to predict on multiple sets by
setting the field predict_sets
to c("train", "test")
(default is "test"
).
Each set yields a separate Prediction object during resampling.
In the next step, you have to configure the measures to operate on the respective Prediction object:
m1 = msr("classif.ce", id = "ce.train", predict_sets = "train") m2 = msr("classif.ce", id = "ce.test", predict_sets = "test")
The (list of) created measures can finally be passed to $aggregate()
or $score()
.
This function can be parallelized with the future or mirai package.
One job is one resampling iteration.
All jobs are send to an apply function from future.apply or mirai::mirai_map()
in a single batch.
To select a parallel backend, use future::plan()
.
To use mirai
, call mirai::daemons(.compute = "mlr3_parallelization")
before calling this function.
The future
package guarantees reproducible results independent of the parallel backend.
The results of mirai
will not be the same but can be made reproducible by setting a seed
when calling mirai::daemons()
.
More on parallelization can be found in the book:
https://mlr3book.mlr-org.com/chapters/chapter10/advanced_technical_aspects_of_mlr3.html
This function supports progress bars via the package progressr.
Simply wrap the function call in progressr::with_progress()
to enable them.
Alternatively, call progressr::handlers()
with global = TRUE
to enable progress bars
globally.
We recommend the progress package as backend which can be enabled with
progressr::handlers("progress")
.
The mlr3 uses the lgr package for logging.
lgr supports multiple log levels which can be queried with
getOption("lgr.log_levels")
.
To suppress output and reduce verbosity, you can lower the log from the
default level "info"
to "warn"
:
lgr::get_logger("mlr3")$set_threshold("warn")
To get additional log output for debugging, increase the log level to "debug"
or "trace"
:
lgr::get_logger("mlr3")$set_threshold("debug")
To log to a file or a data base, see the documentation of lgr::lgr-package.
The fitted models are discarded after the predictions have been scored in order to reduce memory consumption.
If you need access to the models for later analysis, set store_models
to TRUE
.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html#sec-benchmarking
Package mlr3viz for some generic visualizations.
mlr3benchmark for post-hoc analysis of benchmark results.
Other benchmark:
BenchmarkResult
,
benchmark_grid()
# benchmarking with benchmark_grid()
tasks = lapply(c("penguins", "sonar"), tsk)
learners = lapply(c("classif.featureless", "classif.rpart"), lrn)
resamplings = rsmp("cv", folds = 3)
design = benchmark_grid(tasks, learners, resamplings)
print(design)
set.seed(123)
bmr = benchmark(design)
## Data of all resamplings
head(as.data.table(bmr))
## Aggregated performance values
aggr = bmr$aggregate()
print(aggr)
## Extract predictions of first resampling result
rr = aggr$resample_result[[1]]
as.data.table(rr$prediction())
# Benchmarking with a custom design:
# - fit classif.featureless on penguins with a 3-fold CV
# - fit classif.rpart on sonar using a holdout
tasks = list(tsk("penguins"), tsk("sonar"))
learners = list(lrn("classif.featureless"), lrn("classif.rpart"))
resamplings = list(rsmp("cv", folds = 3), rsmp("holdout"))
design = data.table::data.table(
task = tasks,
learner = learners,
resampling = resamplings
)
## Instantiate resamplings
design$resampling = Map(
function(task, resampling) resampling$clone()$instantiate(task),
task = design$task, resampling = design$resampling
)
## Run benchmark
bmr = benchmark(design)
print(bmr)
## Get the training set of the 2nd iteration of the featureless learner on penguins
rr = bmr$aggregate()[learner_id == "classif.featureless"]$resample_result[[1]]
rr$resampling$train_set(2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.