mlr_learners_regr.xgboost: Extreme Gradient Boosting Regression Learner

mlr_learners_regr.xgboostR Documentation

Extreme Gradient Boosting Regression Learner

Description

eXtreme Gradient Boosting regression. Calls xgboost::xgb.train() from package xgboost.

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Note that using the watchlist parameter directly will lead to problems when wrapping this mlr3::Learner in a mlr3pipelines GraphLearner as the preprocessing steps will not be applied to the data in the watchlist. See the section Early Stopping and Validation on how to do this.

Offset

If a Task has a column with the role offset, it will automatically be used during training. The offset is incorporated through the xgboost::xgb.DMatrix interface, using the base_margin field. No offset is applied during prediction for this learner.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("regr.xgboost")
lrn("regr.xgboost")

Meta Information

  • Task type: “regr”

  • Predict Types: “response”

  • Feature Types: “logical”, “integer”, “numeric”

  • Required Packages: mlr3, mlr3learners, xgboost

Parameters

Id Type Default Levels Range
alpha numeric 0 [0, \infty)
approxcontrib logical FALSE TRUE, FALSE -
base_score numeric 0.5 (-\infty, \infty)
booster character gbtree gbtree, gblinear, dart -
callbacks untyped list() -
colsample_bylevel numeric 1 [0, 1]
colsample_bynode numeric 1 [0, 1]
colsample_bytree numeric 1 [0, 1]
device untyped "cpu" -
disable_default_eval_metric logical FALSE TRUE, FALSE -
early_stopping_rounds integer NULL [1, \infty)
eta numeric 0.3 [0, 1]
eval_metric untyped "rmse" -
feature_selector character cyclic cyclic, shuffle, random, greedy, thrifty -
gamma numeric 0 [0, \infty)
grow_policy character depthwise depthwise, lossguide -
interaction_constraints untyped - -
iterationrange untyped - -
lambda numeric 1 [0, \infty)
lambda_bias numeric 0 [0, \infty)
max_bin integer 256 [2, \infty)
max_delta_step numeric 0 [0, \infty)
max_depth integer 6 [0, \infty)
max_leaves integer 0 [0, \infty)
maximize logical NULL TRUE, FALSE -
min_child_weight numeric 1 [0, \infty)
missing numeric NA (-\infty, \infty)
monotone_constraints untyped 0 -
normalize_type character tree tree, forest -
nrounds integer - [1, \infty)
nthread integer 1 [1, \infty)
ntreelimit integer NULL [1, \infty)
num_parallel_tree integer 1 [1, \infty)
objective untyped "reg:squarederror" -
one_drop logical FALSE TRUE, FALSE -
outputmargin logical FALSE TRUE, FALSE -
predcontrib logical FALSE TRUE, FALSE -
predinteraction logical FALSE TRUE, FALSE -
predleaf logical FALSE TRUE, FALSE -
print_every_n integer 1 [1, \infty)
process_type character default default, update -
rate_drop numeric 0 [0, 1]
refresh_leaf logical TRUE TRUE, FALSE -
reshape logical FALSE TRUE, FALSE -
sampling_method character uniform uniform, gradient_based -
sample_type character uniform uniform, weighted -
save_name untyped NULL -
save_period integer NULL [0, \infty)
scale_pos_weight numeric 1 (-\infty, \infty)
seed_per_iteration logical FALSE TRUE, FALSE -
skip_drop numeric 0 [0, 1]
strict_shape logical FALSE TRUE, FALSE -
subsample numeric 1 [0, 1]
top_k integer 0 [0, \infty)
training logical FALSE TRUE, FALSE -
tree_method character auto auto, exact, approx, hist, gpu_hist -
tweedie_variance_power numeric 1.5 [1, 2]
updater untyped - -
verbose integer 1 [0, 2]
watchlist untyped NULL -
xgb_model untyped NULL -

Early Stopping and Validation

In order to monitor the validation performance during the training, you can set the ⁠$validate⁠ field of the Learner. For information on how to configure the validation set, see the Validation section of mlr3::Learner. This validation data can also be used for early stopping, which can be enabled by setting the early_stopping_rounds parameter. The final (or in the case of early stopping best) validation scores can be accessed via ⁠$internal_valid_scores⁠, and the optimal nrounds via ⁠$internal_tuned_values⁠. The internal validation measure can be set via the eval_metric parameter that can be a mlr3::Measure, a function, or a character string for the internal xgboost measures. Using an mlr3::Measure is slower than the internal xgboost measures, but allows to use the same measure for tuning and validation.

Initial parameter values

  • nrounds:

    • Actual default: no default.

    • Adjusted default: 1000.

    • Reason for change: Without a default construction of the learner would error. The lightgbm learner has a default of 1000, so we use the same here.

  • nthread:

    • Actual value: Undefined, triggering auto-detection of the number of CPUs.

    • Adjusted value: 1.

    • Reason for change: Conflicting with parallelization via future.

  • verbose:

    • Actual default: 1.

    • Adjusted default: 0.

    • Reason for change: Reduce verbosity.

Super classes

mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrXgboost

Active bindings

internal_valid_scores

(named list() or NULL) The validation scores extracted from model$evaluation_log. If early stopping is activated, this contains the validation scores of the model for the optimal nrounds, otherwise the nrounds for the final model.

internal_tuned_values

(named list() or NULL) If early stopping is activated, this returns a list with nrounds, which is extracted from ⁠$best_iteration⁠ of the model and otherwise NULL.

validate

(numeric(1) or character(1) or NULL) How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined". Returns the ⁠$best_iteration⁠ when early stopping is activated.

Methods

Public methods

Inherited methods

Method new()

Creates a new instance of this R6 class.

Usage
LearnerRegrXgboost$new()

Method importance()

The importance scores are calculated with xgboost::xgb.importance().

Usage
LearnerRegrXgboost$importance()
Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage
LearnerRegrXgboost$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/2939672.2939785")}.

See Also

Other Learner: mlr_learners_classif.cv_glmnet, mlr_learners_classif.glmnet, mlr_learners_classif.kknn, mlr_learners_classif.lda, mlr_learners_classif.log_reg, mlr_learners_classif.multinom, mlr_learners_classif.naive_bayes, mlr_learners_classif.nnet, mlr_learners_classif.qda, mlr_learners_classif.ranger, mlr_learners_classif.svm, mlr_learners_classif.xgboost, mlr_learners_regr.cv_glmnet, mlr_learners_regr.glmnet, mlr_learners_regr.kknn, mlr_learners_regr.km, mlr_learners_regr.lm, mlr_learners_regr.nnet, mlr_learners_regr.ranger, mlr_learners_regr.svm

Examples

## Not run: 
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("regr.xgboost")
print(learner)

# Define a Task
task = tsk("mtcars")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# print the model
print(learner$model)

# importance method
if("importance" %in% learner$properties) print(learner$importance)

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
}

## End(Not run)

## Not run: 
# Train learner with early stopping on spam data set
task = tsk("mtcars")

# use 30 percent for validation
# Set early stopping parameter
learner = lrn("regr.xgboost",
  nrounds = 100,
  early_stopping_rounds = 10,
  validate = 0.3
)

# Train learner with early stopping
learner$train(task)

# Inspect optimal nrounds and validation performance
learner$internal_tuned_values
learner$internal_valid_scores

## End(Not run)

mlr3learners documentation built on April 3, 2025, 7:44 p.m.