batchmark: Run machine learning benchmarks as distributed experiments.
In guillermozbta/mir: Machine Learning in R

Description Usage Arguments Value See Also

This function is a very parallel version of benchmark using batchtools. Experiments are created in the provided registry for each combination of learners, tasks and resamplings. The experiments are then stored in a registry and the runs can be started via submitJobs. A job is one train/test split of the outer resampling. In case of nested resampling (e.g. with makeTuneWrapper), each job is a full run of inner resampling, which can be parallelized in a second step with ParallelMap. For details on the usage and support backends have a look at the batchtools tutorial page: https://github.com/mllg/batchtools.

The general workflow with batchmark looks like this:

Create an ExperimentRegistry using makeExperimentRegistry.
Call batchmark(...) which defines jobs for all learners and tasks in an expand.grid fashion.
Submit jobs using submitJobs.
Babysit the computation, wait for all jobs to finish using waitForJobs.
Call reduceBatchmarkResult() to reduce results into a BenchmarkResult.

If you want to use this with OpenML datasets you can generate tasks from a vector of dataset IDs easily with tasks = lapply(data.ids, function(x) convertOMLDataSetToMlr(getOMLDataSet(x))).

1 2	batchmark(learners, tasks, resamplings, measures, models = TRUE, reg = batchtools::getDefaultRegistry())

`learners`	[(list of) `Learner` \| `character`] Learning algorithms which should be compared, can also be a single learner. If you pass strings the learners will be created via `makeLearner`.
`tasks`	[(list of) `Task`] Tasks that learners should be run on.
`resamplings`	[(list of) `ResampleDesc`] Resampling strategy for each tasks. If only one is provided, it will be replicated to match the number of tasks. If missing, a 10-fold cross validation is used.
`measures`	[(list of) `Measure`] Performance measures for all tasks. If missing, the default measure of the first task is used.
`models`	[`logical(1)`] Should all fitted models be stored in the `ResampleResult`? Default is `TRUE`.
`reg`	[`Registry`] Registry, created by `makeExperimentRegistry`. If not explicitly passed, uses the last created registry.