| xgb.cv | R Documentation |
The cross validation function of xgboost.
xgb.cv(
params = xgb.params(),
data,
nrounds,
nfold,
prediction = FALSE,
showsd = TRUE,
metrics = list(),
objective = NULL,
custom_metric = NULL,
stratified = "auto",
folds = NULL,
train_folds = NULL,
verbose = TRUE,
print_every_n = 1L,
early_stopping_rounds = NULL,
maximize = NULL,
callbacks = list(),
...
)
params |
List of XGBoost parameters which control the model building process.
See the online documentation
and the documentation for Should be passed as list with named entries. Parameters that are not specified in this list will use their default values. A list of named parameters can be created through the function |
data |
An Note that only the basic |
nrounds |
Max number of boosting iterations. |
nfold |
The original dataset is randomly partitioned into |
prediction |
A logical value indicating whether to return the test fold predictions
from each CV model. This parameter engages the |
showsd |
Logical value whether to show standard deviation of cross validation. |
metrics |
List of evaluation metrics to be used in cross validation, when it is not specified, the evaluation metric is chosen according to objective function. Possible options are:
|
objective |
Customized objective function. Should take two arguments: the first one will be the
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
and the second one will be the It should return a list with two elements |
custom_metric |
Customized evaluation function. Just like Should return a list with two elements Note that even if passing |
stratified |
Logical flag indicating whether sampling of folds should be stratified by the values of outcome labels. For real-valued labels in regression objectives, stratification will be done by discretizing the labels into up to 5 buckets beforehand. If passing "auto", will be set to This parameter is ignored when Value |
folds |
List with pre-defined CV folds (each element must be a vector of test fold's indices).
When folds are supplied, the If |
train_folds |
List specifying which indices to use for training. If This is not supported when |
verbose |
If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting |
print_every_n |
When passing Only has an effect when passing data under |
early_stopping_rounds |
Number of boosting rounds after which training will be stopped
if there is no improvement in performance (as measured by the evaluatiation metric that is
supplied or selected by default for the objective) on the evaluation data passed under
Must pass If |
maximize |
If |
callbacks |
A list of callback functions to perform various task during boosting.
See |
... |
Not used. Some arguments that were part of this function in previous XGBoost versions are currently deprecated or have been renamed. If a deprecated or renamed argument is passed, will throw a warning (by default) and use its current equivalent instead. This warning will become an error if using the 'strict mode' option. If some additional argument is passed that is neither a current function argument nor a deprecated or renamed argument, a warning or error will be thrown depending on the 'strict mode' option. Important: |
The original sample is randomly partitioned into nfold equal size subsamples.
Of the nfold subsamples, a single subsample is retained as the validation data for testing the model,
and the remaining nfold - 1 subsamples are used as training data.
The cross-validation process is then repeated nrounds times, with each of the
nfold subsamples used exactly once as the validation data.
All observations are used for both training and validation.
Adapted from https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29
An object of class 'xgb.cv.synchronous' with the following elements:
call: Function call.
params: Parameters that were passed to the xgboost library. Note that it does not
capture parameters changed by the xgb.cb.reset.parameters() callback.
evaluation_log: Evaluation history stored as a data.table with the
first column corresponding to iteration number and the rest corresponding to the
CV-based evaluation means and standard deviations for the training and test CV-sets.
It is created by the xgb.cb.evaluation.log() callback.
niter: Number of boosting iterations.
nfeatures: Number of features in training data.
folds: The list of CV folds' indices - either those passed through the folds
parameter or randomly generated.
Plus other potential elements that are the result of callbacks, such as a list cv_predict with
a sub-element pred when passing prediction = TRUE, which is added by the xgb.cb.cv.predict()
callback (note that one can also pass it manually under callbacks with different settings,
such as saving also the models created during cross validation); or a list early_stop which
will contain elements such as best_iteration when using the early stopping callback (xgb.cb.early.stop()).
data(agaricus.train, package = "xgboost")
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
cv <- xgb.cv(
data = dtrain,
nrounds = 20,
early_stopping_rounds = 1,
params = xgb.params(
nthread = 2,
max_depth = 3,
objective = "binary:logistic"
),
nfold = 5,
metrics = list("rmse","auc"),
prediction = TRUE
)
print(cv)
print(cv, verbose = TRUE)
# Callbacks might add additional attributes, separated by the name of the callback
cv$early_stop$best_iteration
head(cv$cv_predict$pred)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.