mlr_learners_regr.xgboost: Extreme Gradient Boosting Regression Learner

mlr_learners_regr.xgboostR Documentation

Extreme Gradient Boosting Regression Learner

Description

eXtreme Gradient Boosting regression. Calls xgboost::xgb.train() from package xgboost.

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Note that using the watchlist parameter directly will lead to problems when wrapping this Learner in a mlr3pipelines GraphLearner as the preprocessing steps will not be applied to the data in the watchlist.

Dictionary

This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():

mlr_learners$get("regr.xgboost")
lrn("regr.xgboost")

Meta Information

  • Task type: “regr”

  • Predict Types: “response”

  • Feature Types: “logical”, “integer”, “numeric”

  • Required Packages: mlr3, mlr3learners, xgboost

Parameters

Id Type Default Levels Range
alpha numeric 0 [0, \infty)
approxcontrib logical FALSE TRUE, FALSE -
base_score numeric 0.5 (-\infty, \infty)
booster character gbtree gbtree, gblinear, dart -
callbacks untyped list -
colsample_bylevel numeric 1 [0, 1]
colsample_bynode numeric 1 [0, 1]
colsample_bytree numeric 1 [0, 1]
device untyped cpu -
disable_default_eval_metric logical FALSE TRUE, FALSE -
early_stopping_rounds integer NULL [1, \infty)
early_stopping_set character none none, train, test -
eta numeric 0.3 [0, 1]
eval_metric untyped rmse -
feature_selector character cyclic cyclic, shuffle, random, greedy, thrifty -
feval untyped -
gamma numeric 0 [0, \infty)
grow_policy character depthwise depthwise, lossguide -
interaction_constraints untyped - -
iterationrange untyped - -
lambda numeric 1 [0, \infty)
lambda_bias numeric 0 [0, \infty)
max_bin integer 256 [2, \infty)
max_delta_step numeric 0 [0, \infty)
max_depth integer 6 [0, \infty)
max_leaves integer 0 [0, \infty)
maximize logical NULL TRUE, FALSE -
min_child_weight numeric 1 [0, \infty)
missing numeric NA (-\infty, \infty)
monotone_constraints untyped 0 -
normalize_type character tree tree, forest -
nrounds integer - [1, \infty)
nthread integer 1 [1, \infty)
ntreelimit integer NULL [1, \infty)
num_parallel_tree integer 1 [1, \infty)
objective untyped reg:squarederror -
one_drop logical FALSE TRUE, FALSE -
outputmargin logical FALSE TRUE, FALSE -
predcontrib logical FALSE TRUE, FALSE -
predinteraction logical FALSE TRUE, FALSE -
predleaf logical FALSE TRUE, FALSE -
print_every_n integer 1 [1, \infty)
process_type character default default, update -
rate_drop numeric 0 [0, 1]
refresh_leaf logical TRUE TRUE, FALSE -
reshape logical FALSE TRUE, FALSE -
sampling_method character uniform uniform, gradient_based -
sample_type character uniform uniform, weighted -
save_name untyped -
save_period integer NULL [0, \infty)
scale_pos_weight numeric 1 (-\infty, \infty)
seed_per_iteration logical FALSE TRUE, FALSE -
skip_drop numeric 0 [0, 1]
strict_shape logical FALSE TRUE, FALSE -
subsample numeric 1 [0, 1]
top_k integer 0 [0, \infty)
training logical FALSE TRUE, FALSE -
tree_method character auto auto, exact, approx, hist, gpu_hist -
tweedie_variance_power numeric 1.5 [1, 2]
updater untyped - -
verbose integer 1 [0, 2]
watchlist untyped -
xgb_model untyped -

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. The early_stopping_set parameter controls which set is used to monitor the performance. Set early_stopping_set = "test" to monitor the performance of the model on the test set while training. The test set for early stopping can be set with the "test" row role in the mlr3::Task. Additionally, the range must be set in which the performance must increase with early_stopping_rounds and the maximum number of boosting rounds with nrounds. While resampling, the test set is automatically applied from the mlr3::Resampling. Not that using the test set for early stopping can potentially bias the performance scores. See the section on early stopping in the examples.

Initial parameter values

  • nrounds:

    • Actual default: no default.

    • Adjusted default: 1.

    • Reason for change: Without a default construction of the learner would error. Just setting a nonsense default to workaround this. nrounds needs to be tuned by the user.

  • nthread:

    • Actual value: Undefined, triggering auto-detection of the number of CPUs.

    • Adjusted value: 1.

    • Reason for change: Conflicting with parallelization via future.

  • verbose:

    • Actual default: 1.

    • Adjusted default: 0.

    • Reason for change: Reduce verbosity.

Super classes

mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrXgboost

Methods

Public methods

Inherited methods

Method new()

Creates a new instance of this R6 class.

Usage
LearnerRegrXgboost$new()

Method importance()

The importance scores are calculated with xgboost::xgb.importance().

Usage
LearnerRegrXgboost$importance()
Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage
LearnerRegrXgboost$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/2939672.2939785")}.

See Also

Other Learner: mlr_learners_classif.cv_glmnet, mlr_learners_classif.glmnet, mlr_learners_classif.kknn, mlr_learners_classif.lda, mlr_learners_classif.log_reg, mlr_learners_classif.multinom, mlr_learners_classif.naive_bayes, mlr_learners_classif.nnet, mlr_learners_classif.qda, mlr_learners_classif.ranger, mlr_learners_classif.svm, mlr_learners_classif.xgboost, mlr_learners_regr.cv_glmnet, mlr_learners_regr.glmnet, mlr_learners_regr.kknn, mlr_learners_regr.km, mlr_learners_regr.lm, mlr_learners_regr.nnet, mlr_learners_regr.ranger, mlr_learners_regr.svm

Examples

## Not run: 
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("regr.xgboost")
print(learner)

# Define a Task
task = tsk("mtcars")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# print the model
print(learner$model)

# importance method
if("importance" %in% learner$properties) print(learner$importance)

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
}

## End(Not run)

## Not run: 
# Train learner with early stopping on spam data set
task = tsk("mtcars")

# Split task into training and test set
split = partition(task, ratio = 0.8)
task$set_row_roles(split$test, "test")

# Set early stopping parameter
learner = lrn("regr.xgboost",
  nrounds = 100,
  early_stopping_rounds = 10,
  early_stopping_set = "test"
)

# Train learner with early stopping
learner$train(task)

## End(Not run)

mlr3learners documentation built on Nov. 21, 2023, 5:07 p.m.