knitr::opts_chunk$set(echo = TRUE) TRAVIS <- !identical(tolower(Sys.getenv("TRAVIS")), "true") knitr::opts_chunk$set(purl = TRAVIS)
TPOT comes with a handful of default operators and parameter configurations that work well for optimizing machine learning pipelines. For tpotr two of these are available: The default configuration and a light-weighted configuration.
The default configuration for tpotr will search over a broad range of preprocessors, feature constructors, feature selectors, models, and parameters to find a series of operators that minimize the error of the model predictions. If you do not set a config_dict
parameter, the default configuration is applied. However, some of these operators are complex and may take a long time to run, especially on larger datasets.
For theses cases, the config_dict="TPOT light"
will let TPOT search over a restricted range of preprocessors, feature constructors, feature selectors, models, and parameters to find a series of operators that minimize the error of the model predictions. The pipelines are simpler structured and only fast-running operators will be used. This configuration works for both the TPOTClassifier and TPOTRegressor.
The effects of config_dict
settings can be evaluated using microbenchmarking and prediction accuracy measurement.
Initially, it turns out that the TPOT light configuration runs much faster.
library("mlr") library("microbenchmark") library("tpotr") learner.default = makeLearner(cl = "classif.tpot", population_size = 20, generations = 3, n_jobs = 3, verbosity = 2) learner.light = makeLearner(cl = "classif.tpot", population_size = 20, generations = 3, n_jobs = 3, verbosity = 2, config_dict = "TPOT light") microbenchmark(train(learner.default, iris.task), train(learner.light, iris.task), times = 5L)
The increased speed, however, is at the expense of prediction accuracy.
model.default = train(learner.default, iris.task) model.light = train(learner.default, iris.task) pred.default = predict(model.default, newdata = iris) pred.light = predict(model.light, newdata = iris) data.frame(default = unname(performance(pred.default, measures = list(acc))), light = unname(performance(pred.light, measures = list(acc))))
The choice of config_dict
must therefore depend on the time available to calculate the pipeline, as well as on the required accuracy. If computing time is available "unlimited", the default configuration should be used. However, if it is necessary to achieve sufficiently good results quickly, the TPOT light configuration can be used.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.