| xgb.train | R Documentation |
Fits an XGBoost model to given data in DMatrix format (e.g. as produced by xgb.DMatrix()).
See the tutorial Introduction to Boosted Trees
for a longer explanation of what XGBoost does, and the rest of the
XGBoost Tutorials for further
explanations XGBoost's features and usage.
Compared to function xgboost() which is a user-friendly function targeted towards interactive
usage, xgb.train is a lower-level interface which allows finer-grained control and exposes
further functionalities offered by the core library (such as learning-to-rank objectives), but
which works exclusively with XGBoost's own data format ("DMatrices") instead of with regular R
objects.
The syntax of this function closely mimics the same function from the Python package for XGBoost,
and is recommended to use for package developers over xgboost() as it will provide a more
stable interface (with fewer breaking changes) and lower overhead from data validations.
See also the migration guide if coming from a previous version of XGBoost in the 1.x series.
xgb.train(
params = xgb.params(),
data,
nrounds,
evals = list(),
objective = NULL,
custom_metric = NULL,
verbose = 1,
print_every_n = 1L,
early_stopping_rounds = NULL,
maximize = NULL,
save_period = NULL,
save_name = "xgboost.model",
xgb_model = NULL,
callbacks = list(),
...
)
params |
List of XGBoost parameters which control the model building process.
See the online documentation
and the documentation for Should be passed as list with named entries. Parameters that are not specified in this list will use their default values. A list of named parameters can be created through the function |
data |
Training dataset. Note that there is a function |
nrounds |
Max number of boosting iterations. |
evals |
Named list of When either E.g., specifying |
objective |
Customized objective function. Should take two arguments: the first one will be the
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
and the second one will be the It should return a list with two elements |
custom_metric |
Customized evaluation function. Just like Should return a list with two elements Note that even if passing |
verbose |
If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting |
print_every_n |
When passing Only has an effect when passing data under |
early_stopping_rounds |
Number of boosting rounds after which training will be stopped
if there is no improvement in performance (as measured by the evaluatiation metric that is
supplied or selected by default for the objective) on the evaluation data passed under
Must pass If |
maximize |
If |
save_period |
When not |
save_name |
the name or path for periodically saved model file. |
xgb_model |
A previously built model to continue the training from.
Could be either an object of class |
callbacks |
A list of callback functions to perform various task during boosting.
See Note that some callbacks might try to leave attributes in the resulting model object,
such as an evaluation log (a |
... |
Not used. Some arguments that were part of this function in previous XGBoost versions are currently deprecated or have been renamed. If a deprecated or renamed argument is passed, will throw a warning (by default) and use its current equivalent instead. This warning will become an error if using the 'strict mode' option. If some additional argument is passed that is neither a current function argument nor a deprecated or renamed argument, a warning or error will be thrown depending on the 'strict mode' option. Important: |
Compared to xgboost(), the xgb.train() interface supports advanced features such as
evals, customized objective and evaluation metric functions, among others, with the
difference these work xgb.DMatrix objects and do not follow typical R idioms.
Parallelization is automatically enabled if OpenMP is present.
Number of threads can also be manually specified via the nthread parameter.
While in XGBoost language bindings, the default random seed defaults to zero, in R, if a parameter seed
is not manually supplied, it will generate a random seed through R's own random number generator,
whose seed in turn is controllable through set.seed. If seed is passed, it will override the
RNG from R.
The following callbacks are automatically created when certain parameters are set:
xgb.cb.print.evaluation() is turned on when verbose > 0 and the print_every_n
parameter is passed to it.
xgb.cb.evaluation.log() is on when evals is present.
xgb.cb.early.stop(): When early_stopping_rounds is set.
xgb.cb.save.model(): When save_period > 0 is set.
Note that objects of type xgb.Booster as returned by this function behave a bit differently
from typical R objects (it's an 'altrep' list class), and it makes a separation between
internal booster attributes (restricted to jsonifyable data), accessed through xgb.attr()
and shared between interfaces through serialization functions like xgb.save(); and
R-specific attributes (typically the result from a callback), accessed through attributes()
and attr(), which are otherwise
only used in the R interface, only kept when using R's serializers like saveRDS(), and
not anyhow used by functions like predict.xgb.Booster().
Be aware that one such R attribute that is automatically added is params - this attribute
is assigned from the params argument to this function, and is only meant to serve as a
reference for what went into the booster, but is not used in other methods that take a booster
object - so for example, changing the booster's configuration requires calling xgb.config<-
or xgb.model.parameters<-, while simply modifying attributes(model)$params$<...> will have no
effect elsewhere.
An object of class xgb.Booster.
Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754
xgb.Callback(), predict.xgb.Booster(), xgb.cv()
data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")
## Keep the number of threads to 1 for examples
nthread <- 1
data.table::setDTthreads(nthread)
dtrain <- with(
agaricus.train, xgb.DMatrix(data, label = label, nthread = nthread)
)
dtest <- with(
agaricus.test, xgb.DMatrix(data, label = label, nthread = nthread)
)
evals <- list(train = dtrain, eval = dtest)
## A simple xgb.train example:
param <- xgb.params(
max_depth = 2,
nthread = nthread,
objective = "binary:logistic",
eval_metric = "auc"
)
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
## An xgb.train example where custom objective and evaluation metric are
## used:
logregobj <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
preds <- 1/(1 + exp(-preds))
grad <- preds - labels
hess <- preds * (1 - preds)
return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
return(list(metric = "error", value = err))
}
# These functions could be used by passing them as 'objective' and
# 'eval_metric' parameters in the params list:
param <- xgb.params(
max_depth = 2,
nthread = nthread,
objective = logregobj,
eval_metric = evalerror
)
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
# ... or as dedicated 'objective' and 'custom_metric' parameters of xgb.train:
bst <- xgb.train(
within(param, rm("objective", "eval_metric")),
dtrain, nrounds = 2, evals = evals,
objective = logregobj, custom_metric = evalerror
)
## An xgb.train example of using variable learning rates at each iteration:
param <- xgb.params(
max_depth = 2,
learning_rate = 1,
nthread = nthread,
objective = "binary:logistic",
eval_metric = "auc"
)
my_learning_rates <- list(learning_rate = c(0.5, 0.1))
bst <- xgb.train(
param,
dtrain,
nrounds = 2,
evals = evals,
verbose = 0,
callbacks = list(xgb.cb.reset.parameters(my_learning_rates))
)
## Early stopping:
bst <- xgb.train(
param, dtrain, nrounds = 25, evals = evals, early_stopping_rounds = 3
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.