xgb.cv: Cross Validation

Description Usage Arguments Details Value Examples

View source: R/xgb.cv.R

Description

The cross validation function of xgboost

Usage

1
2
3
4
5
xgb.cv(params = list(), data, nrounds, nfold, label = NULL, missing = NA,
  prediction = FALSE, showsd = TRUE, metrics = list(), obj = NULL,
  feval = NULL, stratified = TRUE, folds = NULL, verbose = TRUE,
  print_every_n = 1L, early_stopping_rounds = NULL, maximize = NULL,
  callbacks = list(), ...)

Arguments

params

the list of parameters. Commonly used ones are:

  • objective objective function, common ones are

    • reg:linear linear regression

    • binary:logistic logistic regression for classification

  • eta step size of each boosting step

  • max_depth maximum depth of the tree

  • nthread number of thread used in training, if not set, all threads are used

See xgb.train for further details. See also demo/ for walkthrough example in R.

data

takes an xgb.DMatrix, matrix, or dgCMatrix as the input.

nrounds

the max number of iterations

nfold

the original dataset is randomly partitioned into nfold equal size subsamples.

label

vector of response values. Should be provided only when data is an R-matrix.

missing

is only used when input is a dense matrix. By default is set to NA, which means that NA values should be considered as 'missing' by the algorithm. Sometimes, 0 or other extreme value might be used to represent missing values.

prediction

A logical value indicating whether to return the test fold predictions from each CV model. This parameter engages the cb.cv.predict callback.

showsd

boolean, whether to show standard deviation of cross validation

metrics,

list of evaluation metrics to be used in cross validation, when it is not specified, the evaluation metric is chosen according to objective function. Possible options are:

  • error binary classification error rate

  • rmse Rooted mean square error

  • logloss negative log-likelihood function

  • auc Area under curve

  • merror Exact matching error, used to evaluate multi-class classification

obj

customized objective function. Returns gradient and second order gradient with given prediction and dtrain.

feval

custimized evaluation function. Returns list(metric='metric-name', value='metric-value') with given prediction and dtrain.

stratified

a boolean indicating whether sampling of folds should be stratified by the values of outcome labels.

folds

list provides a possibility to use a list of pre-defined CV folds (each element must be a vector of test fold's indices). When folds are supplied, the nfold and stratified parameters are ignored.

verbose

boolean, print the statistics during the process

print_every_n

Print each n-th iteration evaluation messages when verbose>0. Default is 1 which means all messages are printed. This parameter is passed to the cb.print.evaluation callback.

early_stopping_rounds

If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance doesn't improve for k rounds. Setting this parameter engages the cb.early.stop callback.

maximize

If feval and early_stopping_rounds are set, then this parameter must be set as well. When it is TRUE, it means the larger the evaluation score the better. This parameter is passed to the cb.early.stop callback.

callbacks

a list of callback functions to perform various task during boosting. See callbacks. Some of the callbacks are automatically created depending on the parameters' values. User can provide either existing or their own callback methods in order to customize the training process.

...

other parameters to pass to params.

Details

The original sample is randomly partitioned into nfold equal size subsamples.

Of the nfold subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold - 1 subsamples are used as training data.

The cross-validation process is then repeated nrounds times, with each of the nfold subsamples used exactly once as the validation data.

All observations are used for both training and validation.

Adapted from http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29#k-fold_cross-validation

Value

An object of class xgb.cv.synchronous with the following elements:

Examples

1
2
3
4
5
6
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
                  max_depth = 3, eta = 1, objective = "binary:logistic")
print(cv)
print(cv, verbose=TRUE)

xgboost documentation built on May 29, 2017, 5:48 p.m.

Search within the xgboost package
Search all R packages, documentation and source code