batchmark: Run machine learning benchmarks as distributed experiments.

Description Usage Arguments Value See Also

View source: R/batchmark.R

Description

This function is a very parallel version of benchmark using batchtools. Experiments are created in the provided registry for each combination of learners, tasks and resamplings. The experiments are then stored in a registry and the runs can be started via submitJobs. A job is one train/test split of the outer resampling. In case of nested resampling (e.g. with makeTuneWrapper), each job is a full run of inner resampling, which can be parallelized in a second step with ParallelMap. For details on the usage and support backends have a look at the batchtools tutorial page: https://github.com/mllg/batchtools.

The general workflow with batchmark looks like this:

  1. Create an ExperimentRegistry using makeExperimentRegistry.

  2. Call batchmark(...) which defines jobs for all learners and tasks in an expand.grid fashion.

  3. Submit jobs using submitJobs.

  4. Babysit the computation, wait for all jobs to finish using waitForJobs.

  5. Call reduceBatchmarkResult() to reduce results into a BenchmarkResult.

If you want to use this with OpenML datasets you can generate tasks from a vector of dataset IDs easily with tasks = lapply(data.ids, function(x) convertOMLDataSetToMlr(getOMLDataSet(x))).

Usage

1
2
batchmark(learners, tasks, resamplings, measures, models = TRUE,
  reg = batchtools::getDefaultRegistry())

Arguments

learners

[(list of) Learner | character]
Learning algorithms which should be compared, can also be a single learner. If you pass strings the learners will be created via makeLearner.

tasks

[(list of) Task]
Tasks that learners should be run on.

resamplings

[(list of) ResampleDesc]
Resampling strategy for each tasks. If only one is provided, it will be replicated to match the number of tasks. If missing, a 10-fold cross validation is used.

measures

[(list of) Measure]
Performance measures for all tasks. If missing, the default measure of the first task is used.

models

[logical(1)]
Should all fitted models be stored in the ResampleResult? Default is TRUE.

reg

[Registry]
Registry, created by makeExperimentRegistry. If not explicitly passed, uses the last created registry.

Value

[data.table]. Generated job ids are stored in the column “job.id”.

See Also

Other benchmark: BenchmarkResult, benchmark, convertBMRToRankMatrix, friedmanPostHocTestBMR, friedmanTestBMR, generateCritDifferencesData, getBMRAggrPerformances, getBMRFeatSelResults, getBMRFilteredFeatures, getBMRLearnerIds, getBMRLearnerShortNames, getBMRLearners, getBMRMeasureIds, getBMRMeasures, getBMRModels, getBMRPerformances, getBMRPredictions, getBMRTaskDescs, getBMRTaskIds, getBMRTuneResults, plotBMRBoxplots, plotBMRRanksAsBarChart, plotBMRSummary, plotCritDifferences, reduceBatchmarkResults


berndbischl/mlr documentation built on Nov. 25, 2017, 9:09 a.m.