s_GBM: Gradient Boosting Machine [C, R, S]

View source: R/s_GBM.R

s_GBMR Documentation

Gradient Boosting Machine [C, R, S]

Description

Train a GBM model using gbm::gbm.fit

Usage

s_GBM(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  weights = NULL,
  ifw = TRUE,
  ifw.type = 2,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  distribution = NULL,
  interaction.depth = 2,
  shrinkage = 0.01,
  bag.fraction = 0.9,
  n.minobsinnode = 5,
  n.trees = 2000,
  max.trees = 5000,
  force.n.trees = NULL,
  gbm.select.smooth = FALSE,
  n.new.trees = 500,
  min.trees = 50,
  failsafe.trees = 500,
  imetrics = FALSE,
  .gs = FALSE,
  grid.resample.params = setup.resample("kfold", 5),
  gridsearch.type = "exhaustive",
  metric = NULL,
  maximize = NULL,
  plot.tune.error = FALSE,
  n.cores = rtCores,
  relInf = TRUE,
  varImp = FALSE,
  offset = NULL,
  var.monotone = NULL,
  keep.data = TRUE,
  var.names = NULL,
  response.name = "y",
  checkmods = FALSE,
  group = NULL,
  plot.perf = FALSE,
  plot.res = ifelse(!is.null(outdir), TRUE, FALSE),
  plot.fitted = NULL,
  plot.predicted = NULL,
  print.plot = FALSE,
  plot.theme = rtTheme,
  x.name = NULL,
  y.name = NULL,
  question = NULL,
  verbose = TRUE,
  trace = 0,
  grid.verbose = verbose,
  gbm.fit.verbose = FALSE,
  outdir = NULL,
  save.gridrun = FALSE,
  save.res = FALSE,
  save.res.mod = FALSE,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE)
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

ifw

Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If weights are provided, ifw is not used.

ifw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

downsample

Logical: If TRUE, downsample majority class to match size of minority class

resample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

distribution

Character: Distribution of the response variable. See gbm::gbm

interaction.depth

[gS] Integer: Interaction depth.

shrinkage

[gS] Float: Shrinkage (learning rate).

bag.fraction

[gS] Float (0, 1): Fraction of cases to use to train each tree. Helps avoid overfitting.

n.minobsinnode

[gS] Integer: Minimum number of observation allowed in node.

n.trees

Integer: Initial number of trees to fit

max.trees

Integer: Maximum number of trees to fit

force.n.trees

Integer: If specified, use this number of trees instead of tuning number of trees

gbm.select.smooth

Logical: If TRUE, smooth the validation error curve.

n.new.trees

Integer: Number of new trees to train if stopping criteria have not been met.

min.trees

Integer: Minimum number of trees to fit.

failsafe.trees

Integer: If tuning fails to find n.trees, use this number instead.

imetrics

Logical: If TRUE, save extra$imetrics with n.trees, depth, and n.nodes.

.gs

Internal use only

grid.resample.params

List: Output of setup.resample defining grid search parameters.

gridsearch.type

Character: Type of grid search to perform: "exhaustive" or "randomized".

metric

Character: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run.

plot.tune.error

Logical: If TRUE, plot the tuning error curve.

n.cores

Integer: Number of cores to use.

relInf

Logical: If TRUE (Default), estimate variables' relative influence.

varImp

Logical: If TRUE, estimate variable importance by permutation (as in random forests; noted as experimental in gbm). Takes longer than (default) relative influence. The two measures are highly correlated.

offset

Numeric vector of offset values, passed to gbm::gbm.fit

var.monotone

Integer vector with values 0, 1, -1 and length = N features. Used to define monotonicity constraints. 0: no constraint, 1: increasing, -1: decreasing.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.theme

Character: "zero", "dark", "box", "darkbox"

x.name

Character: Name for feature set

y.name

Character: Name for outcome

question

Character: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

grid.verbose

Logical: Passed to gridSearchLearn

outdir

Character: If defined, save log, 'plot.all' plots (see above) and RDS file of complete output

save.gridrun

Logical: If TRUE, save grid search models.

save.res.mod

Logical: If TRUE, save gbm model for each grid run. For diagnostic purposes only: Object size adds up quickly

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

Details

Early stopping is implemented by fitting n.trees initially, checking the optionally smoothed validation error curve, and adding n.new.trees if needed, until error does not reduce or max.trees is reached. [gS] in the argument description indicates that a vector of values can be passed, in which case grid search will be performed automatically using the resampling scheme defined by grid.resample.params.

This function includes a workaround for when gbm.fit fails. If an error is detected, gbm.fit is rerun until successful and the procedure continues normally

Author(s)

E.D. Gennatas

See Also

train_cv for external cross-validation

Other Supervised Learning: s_AdaBoost(), s_AddTree(), s_BART(), s_BRUTO(), s_BayesGLM(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GAM(), s_GLM(), s_GLMNET(), s_GLMTree(), s_GLS(), s_H2ODL(), s_H2OGBM(), s_H2ORF(), s_HAL(), s_Isotonic(), s_KNN(), s_LDA(), s_LM(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MARS(), s_MLRF(), s_NBayes(), s_NLA(), s_NLS(), s_NW(), s_PPR(), s_PolyMARS(), s_QDA(), s_QRNN(), s_RF(), s_RFSRC(), s_Ranger(), s_SDA(), s_SGD(), s_SPLS(), s_SVM(), s_TFN(), s_XGBoost(), s_XRF()

Other Tree-based methods: s_AdaBoost(), s_AddTree(), s_BART(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GLMTree(), s_H2OGBM(), s_H2ORF(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MLRF(), s_RF(), s_RFSRC(), s_Ranger(), s_XGBoost(), s_XRF()

Other Ensembles: s_AdaBoost(), s_RF(), s_Ranger()


egenn/rtemis documentation built on Dec. 17, 2024, 6:16 p.m.