xgbOptimization: Bayesian optimization for XGBoost.

View source: R/xgbOptimization.R

xgbOptimizationR Documentation

Bayesian optimization for XGBoost.

Description

Maximizes a xgboost evaluation metric within a set of bounds. After the function is sampled a pre-determined number of times, a Gaussian process is fit to the results. An acquisition function is then maximized to determine the most likely location of the global maximum of the user defined XGBoost evaluation metric. This process is repeated for a set number of iterations.

Usage

xgbOptimization(
  dat,
  dat_label,
  bounds = list(),
  xgb_nfold = 5,
  xgb_nrounds = 20,
  xgb_early_stopping_rounds = 5,
  xgb_metric = "auc",
  xgb_thread = 8,
  opt_initPoints = length(bounds) + 1,
  opt_itersn = 10,
  opt_thread = 1,
  ...
)

Arguments

dat

A matrix object or a dgCMatrix object which columns represent features and rows represent samples.

dat_label

A vector of response classification values.

bounds

A named list of lower and upper bounds for params in xgb.cv. The names of the list should be arguments passed to xgb.cv Use "L" suffix to indicate integers. A fixed parameter should be a two-length vector with the same value, i.e. bound=list(lambda = c(5, 5))

xgb_nfold

The original dataset is randomly partitioned into nfold equal size subsamples.

xgb_nrounds

Max number of boosting iterations.

xgb_early_stopping_rounds

If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance doesn't improve for k rounds. Setting this parameter engages the cb.early.stop callback.

xgb_metric

A evaluation metric to be used in cross validation and will to be maximized. Possible options are:

  • auc Area under curve

  • aucpr Area under PR curve

xgb_thread

Number of thread used in xgb.cv.

opt_initPoints

Number of points to initialize the process with. Points are chosen with latin hypercube sampling within the bounds supplied.

opt_itersn

The total number of times xgb.cv will be run after initialization.

opt_thread

Number of thread used in bayesOpt.

...

Other arguments passed to bayesOpt.

Value

A list of two object:

bayesOpt

An object of class bayesOpt containing information about the process.

BestPars

A list containing the parameters which resulted in the highest returned Score.

Examples

library("xgboost")
data(agaricus.train, package = "xgboost")
dat <- agaricus.train$data
dat_label <- agaricus.train$label
bounds <- list(max_depth = c(1L, 5L), min_child_weight = c(0, 25), subsample = c(0.25, 1))
result <- xgbOptimization(dat = dat, dat_label = dat_label, bounds = bounds, opt_thread = 2)
result

zh542370159/dropSplit documentation built on June 19, 2022, 2:49 p.m.