gbts: Hyperparameter Search for Gradient Boosted Trees

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/gbts.R

Description

This package implements hyperparameter optimization for Gradient Boosted Trees (GBT) on binary classification and regression problems. The current version provides two optimization methods:

Instead of returning a single GBT in the final output, an ensemble of GBTs is produced via the method of ensemble selection. It selects GBTs with replacement from a library into the ensemble, and returns the ensemble with best validation performance. Model library and validation performance are obtained from the hyperparameter search described above, by building GBTs with different hyperparameter settings on the training dataset and obtaining their performances on the validation dataset, based on cross-validation (CV). Since selection from the library is done with replacement, each GBT may be selected more than once into the ensemble. This function returns an ensemble that contains only the unique GBTs with model weights calculated as the number of model duplicates divided by the ensemble size. Each unique GBT in the ensemble is re-trained on the full training data. Prediction is computed as the weighted average of predictions from the re-trained GBTs.

Usage

1
2
3
4
5
6
gbts(x, y, w = rep(1, nrow(x)), nitr = 200, nlhs = floor(nitr/2),
  nprd = 5000, kfld = 10, srch = c("bayes", "random"), nbst = 100,
  ensz = 100, nwrk = 2, rpkg = c("gbm"), pfmc = c("acc", "dev", "ks",
  "auc", "roc", "mse", "rsq", "mae"), cdfx = "fpr", cdfy = "tpr",
  dspt = 0.5, lower = c(2, 10, 0.1, 0.1, 0.01, 50, 1), upper = c(10, 200,
  1, 1, 0.1, 1000, 10), quiet = FALSE)

Arguments

x

a data.frame of predictors. Categorical predictors represented as factors are accepted.

y

a vector of response values. For binary classification, y must contain values of 0 and 1. It is unnecessary to convert y to a factor variable. For regression, y must contain at least two unique values.

w

an optional vector of observation weights.

nitr

an integer of the number of hyperparameter settings that are sampled. For Bayesian optimization, nitr must be larger than nlhs.

nlhs

an integer of the number of Latin Hypercube samples (each sample is a hyperparameter setting) used to initialize the predictive model of GBT performance. This is used for Bayesian optimization only. After initialization, sequential search continues for nitr-nlhs iterations.

nprd

an integer of the number of hyperparameter settings at which GBT performance is estimated using the predictive model and the best is selected to train GBT in the next iteration.

kfld

an integer of the number of folds for cross-validation.

srch

a character of the search method such that srch="bayes" performs Bayesian optimization (default), and srch="random" performs random search.

nbst

an integer of the number of bootstrap samples to construct the predictive model of GBT performance.

ensz

an integer value of the ensemble size - number of GBTs selected into the ensemble. Since ensemble selection is done with replacement, the number of unique GBTs may be less than ensz, but the sum of model weights always equals ensz.

nwrk

an integer of the number of computing workers to use on a single machine.

rpkg

a character indicating which R package implementation of GBT to use. Currently, only the gbm R package is supported.

pfmc

a character of the performance metric used as the optimization objective. For binary classification, pfmc accepts:

  • "acc": accuracy

  • "dev": deviance

  • "ks": Kolmogorov-Smirnov (KS) statistic

  • "auc": area under the ROC curve. Use the cdfx and cdfy arguments to specify the cumulative distributions for the x-axis and y-axis of the ROC curve, respectively. The default ROC curve is given by true positive rate (y-axis) vs. false positive rate (x-axis).

  • "roc": rate on the y-axis of the ROC curve at a particular decision point (threshold) on the x-axis specified by the dspt argument. For example, if the desired performance metric is true positive rate at the 5% false positive rate, specify pfmc="roc", cdfx="fpr", cdfy="tpr", and dspt=0.05.

For regression, pfmc accepts:

  • "mse": mean squared error

  • "mae": mean absolute error

  • "rsq": r-squared (coefficient of determination)

cdfx

a character of the cumulative distribution for the x-axis. Supported values are

  • "fpr": false positive rate

  • "fnr": false negative rate

  • "rpp": rate of positive prediction

cdfy

a character of the cumulative distribution for the y-axis. Supported values are

  • "tpr": true positive rate

  • "tnr": true negative rate

dspt

a decision point (threshold) in [0, 1] for binary classification. If pfmc="acc", instances with probabilities <= dspt are predicted as negative, and those with probabilities > dspt are predicted as positive. If pfmc="roc", dspt is a threhold on the x-axis of the ROC curve such that the corresponding value on the y-axis is used as the performance metric. For example, if the desired performance metric is the true positive rate at the 5% false positive rate, specify pfmc="roc", cdfx="fpr", cdfy="tpr", and dspt=0.05.

lower

a numeric vector containing the minimum values of hyperparameters in the following order:

  • maximum tree depth

  • leaf node size

  • bag fraction

  • fraction of predictors to try for each split

  • shrinkage

  • number of trees

  • scale of weights for positive cases (for binary classification only)

upper

a numeric vector containing the maximum values of hyperparameters in the order above.

quiet

a logical of TRUE turns off the display of optimization progress in the console.

Value

A list of information with the following components:

Author(s)

Waley W. J. Liang <wliang10@gmail.com>

References

Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. 2004. Ensemble selection from libraries of models. In Proceedings of the 21st international conference on Machine learning (ICML'04). http://www.cs.cornell.edu/~alexn/papers/shotgun.icml04.revised.rev2.pdf

See Also

predict.gbts, comperf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Not run: 
# Binary classification

# Load German credit data
data(german_credit)
train <- german_credit$train
test <- german_credit$test
target_idx <- german_credit$target_idx
pred_idx <- german_credit$pred_idx

# Train a GBT model with optimization on AUC
model <- gbts(train[, pred_idx], train[, target_idx], nitr = 200, pfmc = "auc")

# Predict on test data
yhat_test <- predict(model, test[, pred_idx])

# Compute AUC on test data
comperf(test[, target_idx], yhat_test, pfmc = "auc")


# Regression

# Load Boston housing data
data(boston_housing)
train <- boston_housing$train
test <- boston_housing$test
target_idx <- boston_housing$target_idx
pred_idx <- boston_housing$pred_idx

# Train a GBT model with optimization on MSE
model <- gbts(train[, pred_idx], train[, target_idx], nitr = 200, pfmc = "mse")

# Predict on test data
yhat_test <- predict(model, test[, pred_idx])

# Compute MSE on test data
comperf(test[, target_idx], yhat_test, pfmc = "mse")

## End(Not run)

gbts documentation built on May 2, 2019, 9:42 a.m.

Related to gbts in gbts...