xgb.cv.lowmem: Low memory cross-validation wrapper for XGBoost

View source: R/cv.R

xgb.cv.lowmemR Documentation

Low memory cross-validation wrapper for XGBoost

Description

This function performs similar operations to xgboost::xgb.cv, but with the operations performed in a memory efficient manner. Unlike xgboost::xgb.cv, this version does not load all folds into memory from the start. Rather it loads each fold into memory sequentially, and trains trains each fold using xgboost::xgb.train. This allows larger datasets to be cross-validated.

The main disadvantage of this function is that it is not possible to perform early stopping based the results of all folds. The function does accept an early stopping argument, but this is applied to each fold separately. This means that different folds can (and should be expected to) train for a different number of rounds.

This function also allows for a train-test split (as opposed to multiple) folds. This is done by providing a value of less than 1 to nfold, or a list of 1 fold to folds. This is not possible with xgboost::xgb.cv, but can be desirable if there is downstream processing that depends on an xgb.cv.synchromous object (which is the return object of both this function and xgboost::xgb.cv).

Otherwise, where possible this function tries to return the same data structure as xgboost::xgb.cv, with the exception of callbacks (not supported as a field within the return object). To save models, use the save_models argument, rather than the cb.cv.predict(save_models = TRUE) callback.

Usage

xgb.cv.lowmem(
  params = list(),
  data,
  nrounds,
  nfold,
  label = NULL,
  missing = NA,
  prediction = FALSE,
  metrics = list(),
  obj = NULL,
  feval = NULL,
  stratified = TRUE,
  folds = NULL,
  train_folds = NULL,
  verbose = 1,
  print_every_n = 1L,
  early_stopping_rounds = NULL,
  maximize = NULL,
  save_models = FALSE,
  ...
)

Arguments

params

parameters for xgboost

data

DMatrix or matrix

nrounds

number of training rounds

nfold

number of folds, or if < 1 then the proportion will be used as the training split in a train-test split

label

data labels (alternatively provide with DMatrix)

missing

handling of missing data (see xgb.cv)

prediction

return predictions

metrics

evaluation metrics

obj

custom objective function

feval

custom evaluation function

stratified

whether to use stratified folds

folds

custom folds

train_folds

custom train folds

verbose

verbosity level

print_every_n

print every n iterations

early_stopping_rounds

early stopping rounds (applied to each fold)

maximize

whether to maximize the evaluation metric

save_models

whether to save the models

...

additional arguments passed to xgb.train

Value

xgb.cv.synchronous object

Examples

train <- list(data = matrix(rnorm(20), ncol = 2),
             label = rbinom(10, 1, 0.5))
dtrain <- xgboost::xgb.DMatrix(train$data, label = train$label, nthread = 1)
cv <- xgb.cv.lowmem(data = dtrain,
                   params = list(objective = "binary:logistic"),
                   nrounds = 2,
                   nfold = 3,
                   prediction = TRUE,
                   nthread = 1)
cv

MIC documentation built on April 12, 2025, 2:26 a.m.