AutoMLBase: AutoMLBase

Description Internals Construction Public fields Methods

Description

Base class for AutoML in mlr3automl. Has subclasses for Classification and Regression.

Internals

The AutoMLBase class uses mlr3pipelines to create a machine learning pipeline.
This pipeline contains multiple models (Logistic Regression, Random Forest, Gradient Boosting), which are wrapped in a GraphLearner.
This GraphLearner is wrapped in an AutoTuner for Hyperparameter Optimization and proper resampling.
Tuning is performed using Hyperband.

Construction

Objects should be created using the AutoML interface function.

1
2
model = AutoML(task, learner_list, learner_timeout, resampling, measure, runtime,
               terminator, preprocessing, portfolio)

Public fields

task

(Task)
Contains the task to be solved.

learner_list

(list() | character())
List of names from mlr_learners. Can be used to customize the learners to be tuned over.

learner_timeout

(integer(1))
Budget (in seconds) for a single parameter evaluation during model training.
If this budget is exceeded, the evaluation is stopped and performance measured with the fallback LearnerClassifFeatureless or LearnerRegrFeatureless.
When this is NULL (default), the learner timeout defaults to runtime / 5.

resampling

(Resampling)
Contains the resampling method to be used for hyper-parameter optimization.

measure

(Measure)
Contains the performance measure, for which we optimize during training.

learner

(AutoTuner)
The ML pipeline at the core of mlr3automl is an AutoTuner containing a GraphLearner.

runtime

(integer(1))
Number of seconds for which to run the optimization. Does not include training time of the final model.
Defaults to Inf, letting Hyperband terminate the tuning.

tuning_terminator

(Terminator)
Contains an optional additional termination criterion for model tuning.
Note that the Hyperband tuner might stop training before the budget is exhausted. TerminatorRunTime should not be used, use the separate runtime parameter instead.
Defaults to TerminatorNone, letting Hyperband terminate the tuning.

tuner

(TunerHyperband)
Tuning is performed using TunerHyperband with subsampling fractions between [0.1, 1] and η = 3

preprocessing

(character(1) | Graph)
Type of preprocessing to be used. Possible values are :

  • "none": No preprocessing at all

  • "stability": pipeline_robustify is used to guarantee stability of the learners in the pipeline

  • "full": Adds additional preprocessing operators for Imputation, Impact Encoding and PCA.
    The choice of preprocessing operators is optimised during tuning.

Alternatively, a Graph object can be used to specify a custom preprocessing pipeline.

portfolio

(logical(1))
Whether or not to try a fixed portfolio of known good learners prior to tuning.

additional_params

(ParamSet)
Additional parameter space to tune over, e.g. for custom learners / preprocessing.

custom_trafo

(function(x, param_set))
Trafo function to be applied in addition to existing transformations. Can be used to transform additional_params.

Methods

Public methods


Method new()

Creates a new instance of this R6 class.

Usage
AutoMLBase$new(
  task,
  learner_list = NULL,
  learner_timeout = NULL,
  resampling = NULL,
  measure = NULL,
  runtime = Inf,
  terminator = NULL,
  preprocessing = NULL,
  portfolio = TRUE,
  additional_params = NULL,
  custom_trafo = NULL
)
Arguments
task

(Task)
Contains the task to be solved. Currently TaskClassif and TaskRegr are supported.

learner_list

(list() | character())
List of names from mlr_learners. Can be used to customize the learners to be tuned over.
Default learners for classification: c("classif.ranger", "classif.xgboost", "classif.liblinear")
Default learners for regression: c("regr.ranger", "regr.xgboost", "regr.svm", "regr.liblinear", "regr.cv_glmnet")
Might break mlr3automl if a user-provided learner is incompatible with the provided task.

learner_timeout

(integer(1))
Budget (in seconds) for a single parameter evaluation during model training.
If this budget is exceeded, the evaluation is stopped and performance measured with the fallback LearnerClassifFeatureless or LearnerRegrFeatureless.
When this is NULL (default), the learner timeout defaults to runtime / 5.

resampling

(Resampling)
Contains the resampling method to be used for hyper-parameter optimization. Defaults to ResamplingHoldout.

measure

(Measure)
Contains the performance measure, for which we optimize during training.
Defaults to Accuracy for classification and RMSE for regression.

runtime

(integer(1))
Number of seconds for which to run the optimization. Does not include training time of the final model.
Defaults to Inf, letting Hyperband terminate the tuning.

terminator

(Terminator)
Contains an optional additional termination criterion for model tuning.
Note that the Hyperband tuner might stop training before the budget is exhausted. TerminatorRunTime should not be used, use the separate runtime parameter instead.
Defaults to TerminatorNone, letting Hyperband terminate the tuning.

preprocessing

(character(1) | Graph)
Type of preprocessing to be used. Possible values are :

  • "none": No preprocessing at all

  • "stability": pipeline_robustify is used to guarantee stability of the learners in the pipeline

  • "full": Adds additional preprocessing operators for Imputation, Impact Encoding and PCA.
    The choice of preprocessing operators is optimised during tuning.

Alternatively, a Graph object can be used to specify a custom preprocessing pipeline.

portfolio

(logical(1))
mlr3automl tries out a fixed portfolio of known good learners prior to tuning.
The portfolio parameter disables trying these portfolio learners.

additional_params

(ParamSet)
Additional parameter space to tune over, e.g. for custom learners / preprocessing.

custom_trafo

(function(x, param_set))
Trafo function to be applied in addition to existing transformations. Can be used to transform additional_params.

Returns

AutoMLBase


Method train()

Trains the AutoML system.

Usage
AutoMLBase$train(row_ids = NULL)
Arguments
row_ids

(integer())
Vector of training indices.


Method predict()

Returns a Prediction object for the given data based on the trained model.

Usage
AutoMLBase$predict(data = NULL, row_ids = NULL)
Arguments
data

(data.frame | data.table | Task)
New observations to be predicted. If NULL, defaults to the task the model was trained on.

row_ids

(integer())
Vector of training indices.

Returns

PredictionClassif | PredictionRegr


Method resample()

Performs nested resampling. ResamplingHoldout is used for the outer resampling.

Usage
AutoMLBase$resample()
Returns

ResampleResult


Method tuned_params()

Helper to extract the best hyperparameters from a tuned model.

Usage
AutoMLBase$tuned_params()
Returns

data.table


Method explain()

Create explanation objects for a trained model

Usage
AutoMLBase$explain(iml_package = "DALEX")
Arguments
iml_package

(character(0))
Package to be used: either DALEX or iml. Defaults to DALEX.

Returns

explainer object


Method clone()

The objects of this class are cloneable with this method.

Usage
AutoMLBase$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


a-hanf/mlr3automl documentation built on Feb. 21, 2022, 1:06 a.m.