XGBModel: Extreme Gradient Boosting Models
In MachineShop: Machine Learning Models and Tools

XGBModel

R Documentation

Extreme Gradient Boosting Models

Description

Fits models with an efficient implementation of the gradient boosting framework from Chen & Guestrin.

Usage

XGBModel(
  nrounds = 100,
  ...,
  objective = character(),
  aft_loss_distribution = "normal",
  aft_loss_distribution_scale = 1,
  base_score = 0.5,
  verbose = 0,
  print_every_n = 1
)

XGBDARTModel(
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  min_child_weight = 1,
  max_delta_step = .(0.7 * is(y, "PoissonVariate")),
  subsample = 1,
  colsample_bytree = 1,
  colsample_bylevel = 1,
  colsample_bynode = 1,
  alpha = 0,
  lambda = 1,
  tree_method = "auto",
  sketch_eps = 0.03,
  scale_pos_weight = 1,
  refresh_leaf = 1,
  process_type = "default",
  grow_policy = "depthwise",
  max_leaves = 0,
  max_bin = 256,
  num_parallel_tree = 1,
  sample_type = "uniform",
  normalize_type = "tree",
  rate_drop = 0,
  one_drop = 0,
  skip_drop = 0,
  ...
)

XGBLinearModel(
  alpha = 0,
  lambda = 0,
  updater = "shotgun",
  feature_selector = "cyclic",
  top_k = 0,
  ...
)

XGBTreeModel(
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  min_child_weight = 1,
  max_delta_step = .(0.7 * is(y, "PoissonVariate")),
  subsample = 1,
  colsample_bytree = 1,
  colsample_bylevel = 1,
  colsample_bynode = 1,
  alpha = 0,
  lambda = 1,
  tree_method = "auto",
  sketch_eps = 0.03,
  scale_pos_weight = 1,
  refresh_leaf = 1,
  process_type = "default",
  grow_policy = "depthwise",
  max_leaves = 0,
  max_bin = 256,
  num_parallel_tree = 1,
  ...
)

Arguments

`nrounds`	number of boosting iterations.
`...`	model parameters as described below and in the XGBoost documentation and arguments passed to `XGBModel` from the other constructors.
`objective`	optional character string defining the learning task and objective. Set automatically if not specified according to the following values available for supported response variable types. `factor`: `"multi:softprob"`, `"binary:logistic"` (2 levels only) `numeric`: `"reg:squarederror"`, `"reg:logistic"`, `"reg:gamma"`, `"reg:tweedie"`, `"rank:pairwise"`, `"rank:ndcg"`, `"rank:map"` `PoissonVariate`: `"count:poisson"` `Surv`: `"survival:aft"`, `"survival:cox"` The first values listed are the defaults for the corresponding response types.
`aft_loss_distribution`	character string specifying a distribution for the accelerated failure time objective (`"survival:aft"`) as `"extreme"`, `"logistic"`, or `"normal"`.
`aft_loss_distribution_scale`	numeric scaling parameter for the accelerated failure time distribution.
`base_score`	initial prediction score of all observations, global bias.
`verbose`	numeric value controlling the amount of output printed during model fitting, such that 0 = none, 1 = performance information, and 2 = additional information.
`print_every_n`	numeric value designating the fitting iterations at at which to print output when `verbose > 0`.
`eta`	shrinkage of variable weights at each iteration to prevent overfitting.
`gamma`	minimum loss reduction required to split a tree node.
`max_depth`	maximum tree depth.
`min_child_weight`	minimum sum of observation weights required of nodes.
`max_delta_step`, `tree_method`, `sketch_eps`, `scale_pos_weight`, `updater`, `refresh_leaf`, `process_type`, `grow_policy`, `max_leaves`, `max_bin`, `num_parallel_tree`	other tree booster parameters.
`subsample`	subsample ratio of the training observations.
`colsample_bytree`, `colsample_bylevel`, `colsample_bynode`	subsample ratio of variables for each tree, level, or split.
`alpha`, `lambda`	L1 and L2 regularization terms for variable weights.
`sample_type`, `normalize_type`	type of sampling and normalization algorithms.
`rate_drop`	rate at which to drop trees during the dropout procedure.
`one_drop`	integer indicating whether to drop at least one tree during the dropout procedure.
`skip_drop`	probability of skipping the dropout procedure during a boosting iteration.
`feature_selector`, `top_k`	character string specifying the feature selection and ordering method, and number of top variables to select in the `"greedy"` and `"thrifty"` feature selectors.

Details

Response types:

factor, numeric, PoissonVariate, Surv

Automatic tuning of grid parameters:

XGBModel: NULL
XGBDARTModel: nrounds, eta*, gamma*, max_depth, min_child_weight*, subsample*, colsample_bytree*, rate_drop*, skip_drop*
XGBLinearModel: nrounds, alpha, lambda
XGBTreeModel: nrounds, eta*, gamma*, max_depth, min_child_weight*, subsample*, colsample_bytree*

* excluded from grids by default

The booster-specific constructor functions XGBDARTModel, XGBLinearModel, and XGBTreeModel are special cases of XGBModel which automatically set the XGBoost booster parameter. These are called directly in typical usage unless XGBModel is needed to specify a more general model.

Default argument values and further model details can be found in the source See Also link below.

In calls to varimp for XGBTreeModel, argument type may be specified as "Gain" (default) for the fractional contribution of each predictor to the total gain of its splits, as "Cover" for the number of observations related to each predictor, or as "Frequency" for the percentage of times each predictor is used in the trees. Variable importance is automatically scaled to range from 0 to 100. To obtain unscaled importance values, set scale = FALSE. See example below.

Value

MLModel class object.

Examples


## Requires prior installation of suggested package xgboost to run

model_fit <- fit(Species ~ ., data = iris, model = XGBTreeModel)
varimp(model_fit, method = "model", type = "Frequency", scale = FALSE)

MachineShop documentation built on June 10, 2025, 1:08 a.m.