mlr_learners_regr.xgboost | R Documentation |
eXtreme Gradient Boosting regression.
Calls xgboost::xgb.train()
from package xgboost.
To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.
Note that using the watchlist
parameter directly will lead to problems when wrapping this mlr3::Learner in a
mlr3pipelines
GraphLearner
as the preprocessing steps will not be applied to the data in the watchlist.
See the section Early Stopping and Validation on how to do this.
If a Task
has a column with the role offset
, it will automatically be used during training.
The offset is incorporated through the xgboost::xgb.DMatrix interface, using the base_margin
field.
No offset is applied during prediction for this learner.
This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn()
:
mlr_learners$get("regr.xgboost") lrn("regr.xgboost")
Task type: “regr”
Predict Types: “response”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3learners, xgboost
Id | Type | Default | Levels | Range |
alpha | numeric | 0 | [0, \infty) |
|
approxcontrib | logical | FALSE | TRUE, FALSE | - |
base_score | numeric | 0.5 | (-\infty, \infty) |
|
booster | character | gbtree | gbtree, gblinear, dart | - |
callbacks | untyped | list() | - | |
colsample_bylevel | numeric | 1 | [0, 1] |
|
colsample_bynode | numeric | 1 | [0, 1] |
|
colsample_bytree | numeric | 1 | [0, 1] |
|
device | untyped | "cpu" | - | |
disable_default_eval_metric | logical | FALSE | TRUE, FALSE | - |
early_stopping_rounds | integer | NULL | [1, \infty) |
|
eta | numeric | 0.3 | [0, 1] |
|
eval_metric | untyped | "rmse" | - | |
feature_selector | character | cyclic | cyclic, shuffle, random, greedy, thrifty | - |
gamma | numeric | 0 | [0, \infty) |
|
grow_policy | character | depthwise | depthwise, lossguide | - |
interaction_constraints | untyped | - | - | |
iterationrange | untyped | - | - | |
lambda | numeric | 1 | [0, \infty) |
|
lambda_bias | numeric | 0 | [0, \infty) |
|
max_bin | integer | 256 | [2, \infty) |
|
max_delta_step | numeric | 0 | [0, \infty) |
|
max_depth | integer | 6 | [0, \infty) |
|
max_leaves | integer | 0 | [0, \infty) |
|
maximize | logical | NULL | TRUE, FALSE | - |
min_child_weight | numeric | 1 | [0, \infty) |
|
missing | numeric | NA | (-\infty, \infty) |
|
monotone_constraints | untyped | 0 | - | |
normalize_type | character | tree | tree, forest | - |
nrounds | integer | - | [1, \infty) |
|
nthread | integer | 1 | [1, \infty) |
|
ntreelimit | integer | NULL | [1, \infty) |
|
num_parallel_tree | integer | 1 | [1, \infty) |
|
objective | untyped | "reg:squarederror" | - | |
one_drop | logical | FALSE | TRUE, FALSE | - |
outputmargin | logical | FALSE | TRUE, FALSE | - |
predcontrib | logical | FALSE | TRUE, FALSE | - |
predinteraction | logical | FALSE | TRUE, FALSE | - |
predleaf | logical | FALSE | TRUE, FALSE | - |
print_every_n | integer | 1 | [1, \infty) |
|
process_type | character | default | default, update | - |
rate_drop | numeric | 0 | [0, 1] |
|
refresh_leaf | logical | TRUE | TRUE, FALSE | - |
reshape | logical | FALSE | TRUE, FALSE | - |
sampling_method | character | uniform | uniform, gradient_based | - |
sample_type | character | uniform | uniform, weighted | - |
save_name | untyped | NULL | - | |
save_period | integer | NULL | [0, \infty) |
|
scale_pos_weight | numeric | 1 | (-\infty, \infty) |
|
seed_per_iteration | logical | FALSE | TRUE, FALSE | - |
skip_drop | numeric | 0 | [0, 1] |
|
strict_shape | logical | FALSE | TRUE, FALSE | - |
subsample | numeric | 1 | [0, 1] |
|
top_k | integer | 0 | [0, \infty) |
|
training | logical | FALSE | TRUE, FALSE | - |
tree_method | character | auto | auto, exact, approx, hist, gpu_hist | - |
tweedie_variance_power | numeric | 1.5 | [1, 2] |
|
updater | untyped | - | - | |
verbose | integer | 1 | [0, 2] |
|
watchlist | untyped | NULL | - | |
xgb_model | untyped | NULL | - | |
In order to monitor the validation performance during the training, you can set the $validate
field of the Learner.
For information on how to configure the validation set, see the Validation section of mlr3::Learner.
This validation data can also be used for early stopping, which can be enabled by setting the early_stopping_rounds
parameter.
The final (or in the case of early stopping best) validation scores can be accessed via $internal_valid_scores
, and the optimal nrounds
via $internal_tuned_values
.
The internal validation measure can be set via the eval_metric
parameter that can be a mlr3::Measure, a function, or a character string for the internal xgboost measures.
Using an mlr3::Measure is slower than the internal xgboost measures, but allows to use the same measure for tuning and validation.
nrounds
:
Actual default: no default.
Adjusted default: 1000.
Reason for change: Without a default construction of the learner would error. The lightgbm learner has a default of 1000, so we use the same here.
nthread
:
Actual value: Undefined, triggering auto-detection of the number of CPUs.
Adjusted value: 1.
Reason for change: Conflicting with parallelization via future.
verbose
:
Actual default: 1.
Adjusted default: 0.
Reason for change: Reduce verbosity.
mlr3::Learner
-> mlr3::LearnerRegr
-> LearnerRegrXgboost
internal_valid_scores
(named list()
or NULL
)
The validation scores extracted from model$evaluation_log
.
If early stopping is activated, this contains the validation scores of the model for the optimal nrounds
,
otherwise the nrounds
for the final model.
internal_tuned_values
(named list()
or NULL
)
If early stopping is activated, this returns a list with nrounds
,
which is extracted from $best_iteration
of the model and otherwise NULL
.
validate
(numeric(1)
or character(1)
or NULL
)
How to construct the internal validation data. This parameter can be either NULL
,
a ratio, "test"
, or "predefined"
.
Returns the $best_iteration
when early stopping is activated.
new()
Creates a new instance of this R6 class.
LearnerRegrXgboost$new()
importance()
The importance scores are calculated with xgboost::xgb.importance()
.
LearnerRegrXgboost$importance()
Named numeric()
.
clone()
The objects of this class are cloneable with this method.
LearnerRegrXgboost$clone(deep = FALSE)
deep
Whether to make a deep clone.
To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.
Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/2939672.2939785")}.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
Dictionary of Learners: mlr3::mlr_learners
as.data.table(mlr_learners)
for a table of available Learners in the running session (depending on the loaded packages).
mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
mlr3proba for probabilistic supervised regression and survival analysis.
mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Other Learner:
mlr_learners_classif.cv_glmnet
,
mlr_learners_classif.glmnet
,
mlr_learners_classif.kknn
,
mlr_learners_classif.lda
,
mlr_learners_classif.log_reg
,
mlr_learners_classif.multinom
,
mlr_learners_classif.naive_bayes
,
mlr_learners_classif.nnet
,
mlr_learners_classif.qda
,
mlr_learners_classif.ranger
,
mlr_learners_classif.svm
,
mlr_learners_classif.xgboost
,
mlr_learners_regr.cv_glmnet
,
mlr_learners_regr.glmnet
,
mlr_learners_regr.kknn
,
mlr_learners_regr.km
,
mlr_learners_regr.lm
,
mlr_learners_regr.nnet
,
mlr_learners_regr.ranger
,
mlr_learners_regr.svm
## Not run:
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("regr.xgboost")
print(learner)
# Define a Task
task = tsk("mtcars")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
# print the model
print(learner$model)
# importance method
if("importance" %in% learner$properties) print(learner$importance)
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
}
## End(Not run)
## Not run:
# Train learner with early stopping on spam data set
task = tsk("mtcars")
# use 30 percent for validation
# Set early stopping parameter
learner = lrn("regr.xgboost",
nrounds = 100,
early_stopping_rounds = 10,
validate = 0.3
)
# Train learner with early stopping
learner$train(task)
# Inspect optimal nrounds and validation performance
learner$internal_tuned_values
learner$internal_valid_scores
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.