mlr_learners_classif.xgboost: Extreme Gradient Boosting Classification Learner

mlr_learners_classif.xgboostR Documentation

Extreme Gradient Boosting Classification Learner

Description

eXtreme Gradient Boosting classification. Calls xgboost::xgb.train() from package xgboost.

If not specified otherwise, the evaluation metric is set to the default "logloss" for binary classification problems and set to "mlogloss" for multiclass problems. This was necessary to silence a deprecation warning.

Note that using the watchlist parameter directly will lead to problems when wrapping this mlr3::Learner in a mlr3pipelines GraphLearner as the preprocessing steps will not be applied to the data in the watchlist. See the section Early Stopping and Validation on how to do this.

Initial parameter values

  • nrounds:

    • Actual default: no default.

    • Adjusted default: 1.

    • Reason for change: Without a default construction of the learner would error. Just setting a nonsense default to workaround this. nrounds needs to be tuned by the user.

  • nthread:

    • Actual value: Undefined, triggering auto-detection of the number of CPUs.

    • Adjusted value: 1.

    • Reason for change: Conflicting with parallelization via future.

  • verbose:

    • Actual default: 1.

    • Adjusted default: 0.

    • Reason for change: Reduce verbosity.

Early Stopping and Validation

In order to monitor the validation performance during the training, you can set the ⁠$validate⁠ field of the Learner. For information on how to configure the valdiation set, see the Validation section of mlr3::Learner. This validation data can also be used for early stopping, which can be enabled by setting the early_stopping_rounds parameter. The final (or in the case of early stopping best) validation scores can be accessed via ⁠$internal_valid_scores⁠, and the optimal nrounds via ⁠$internal_tuned_values⁠.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("classif.xgboost")
lrn("classif.xgboost")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “logical”, “integer”, “numeric”

  • Required Packages: mlr3, mlr3learners, xgboost

Parameters

Id Type Default Levels Range
alpha numeric 0 [0, \infty)
approxcontrib logical FALSE TRUE, FALSE -
base_score numeric 0.5 (-\infty, \infty)
booster character gbtree gbtree, gblinear, dart -
callbacks untyped list() -
colsample_bylevel numeric 1 [0, 1]
colsample_bynode numeric 1 [0, 1]
colsample_bytree numeric 1 [0, 1]
device untyped "cpu" -
disable_default_eval_metric logical FALSE TRUE, FALSE -
early_stopping_rounds integer NULL [1, \infty)
eta numeric 0.3 [0, 1]
eval_metric untyped - -
feature_selector character cyclic cyclic, shuffle, random, greedy, thrifty -
feval untyped NULL -
gamma numeric 0 [0, \infty)
grow_policy character depthwise depthwise, lossguide -
interaction_constraints untyped - -
iterationrange untyped - -
lambda numeric 1 [0, \infty)
lambda_bias numeric 0 [0, \infty)
max_bin integer 256 [2, \infty)
max_delta_step numeric 0 [0, \infty)
max_depth integer 6 [0, \infty)
max_leaves integer 0 [0, \infty)
maximize logical NULL TRUE, FALSE -
min_child_weight numeric 1 [0, \infty)
missing numeric NA (-\infty, \infty)
monotone_constraints untyped 0 -
nrounds integer - [1, \infty)
normalize_type character tree tree, forest -
nthread integer 1 [1, \infty)
ntreelimit integer NULL [1, \infty)
num_parallel_tree integer 1 [1, \infty)
objective untyped "binary:logistic" -
one_drop logical FALSE TRUE, FALSE -
outputmargin logical FALSE TRUE, FALSE -
predcontrib logical FALSE TRUE, FALSE -
predinteraction logical FALSE TRUE, FALSE -
predleaf logical FALSE TRUE, FALSE -
print_every_n integer 1 [1, \infty)
process_type character default default, update -
rate_drop numeric 0 [0, 1]
refresh_leaf logical TRUE TRUE, FALSE -
reshape logical FALSE TRUE, FALSE -
seed_per_iteration logical FALSE TRUE, FALSE -
sampling_method character uniform uniform, gradient_based -
sample_type character uniform uniform, weighted -
save_name untyped NULL -
save_period integer NULL [0, \infty)
scale_pos_weight numeric 1 (-\infty, \infty)
skip_drop numeric 0 [0, 1]
strict_shape logical FALSE TRUE, FALSE -
subsample numeric 1 [0, 1]
top_k integer 0 [0, \infty)
training logical FALSE TRUE, FALSE -
tree_method character auto auto, exact, approx, hist, gpu_hist -
tweedie_variance_power numeric 1.5 [1, 2]
updater untyped - -
verbose integer 1 [0, 2]
watchlist untyped NULL -
xgb_model untyped NULL -

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifXgboost

Active bindings

internal_valid_scores

(named list() or NULL) The validation scores extracted from model$evaluation_log. If early stopping is activated, this contains the validation scores of the model for the optimal nrounds, otherwise the nrounds for the final model.

internal_tuned_values

(named list() or NULL) If early stopping is activated, this returns a list with nrounds, which is extracted from ⁠$best_iteration⁠ of the model and otherwise NULL.

validate

(numeric(1) or character(1) or NULL) How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Public methods

Inherited methods

Method new()

Creates a new instance of this R6 class.

Usage
LearnerClassifXgboost$new()

Method importance()

The importance scores are calculated with xgboost::xgb.importance().

Usage
LearnerClassifXgboost$importance()
Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage
LearnerClassifXgboost$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/2939672.2939785")}.

See Also

Other Learner: mlr_learners_classif.cv_glmnet, mlr_learners_classif.glmnet, mlr_learners_classif.kknn, mlr_learners_classif.lda, mlr_learners_classif.log_reg, mlr_learners_classif.multinom, mlr_learners_classif.naive_bayes, mlr_learners_classif.nnet, mlr_learners_classif.qda, mlr_learners_classif.ranger, mlr_learners_classif.svm, mlr_learners_regr.cv_glmnet, mlr_learners_regr.glmnet, mlr_learners_regr.kknn, mlr_learners_regr.km, mlr_learners_regr.lm, mlr_learners_regr.nnet, mlr_learners_regr.ranger, mlr_learners_regr.svm, mlr_learners_regr.xgboost

Examples

## Not run: 
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("classif.xgboost")
print(learner)

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# print the model
print(learner$model)

# importance method
if("importance" %in% learner$properties) print(learner$importance)

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
}

## End(Not run)

## Not run: 
# Train learner with early stopping on spam data set
task = tsk("spam")

# use 30 percent for validation
# Set early stopping parameter
learner = lrn("classif.xgboost",
  nrounds = 100,
  early_stopping_rounds = 10,
  validate = 0.3
)

# Train learner with early stopping
learner$train(task)

# Inspect optimal nrounds and validation performance
learner$internal_tuned_values
learner$internal_valid_scores

## End(Not run)

mlr3learners documentation built on June 28, 2024, 5:09 p.m.