s_AddTree: Additive Tree: Tree-Structured Boosting C

View source: R/s_AddTree.R

s_AddTreeR Documentation

Additive Tree: Tree-Structured Boosting C

Description

Train an Additive Tree model

Usage

s_AddTree(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  x.name = NULL,
  y.name = NULL,
  weights = NULL,
  update = c("exponential", "polynomial"),
  min.update = ifelse(update == "polynomial", 0.035, 1000),
  min.hessian = 0.001,
  min.membership = 1,
  steps.past.min.membership = 0,
  gamma = 0.8,
  max.depth = 30,
  learning.rate = 0.1,
  ifw = TRUE,
  ifw.type = 2,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  imetrics = TRUE,
  grid.resample.params = setup.resample("kfold", 5),
  metric = "Balanced Accuracy",
  maximize = TRUE,
  rpart.params = NULL,
  match.rules = TRUE,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  question = NULL,
  verbose = TRUE,
  prune.verbose = FALSE,
  trace = 1,
  grid.verbose = verbose,
  outdir = NULL,
  save.rpart = FALSE,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
  n.cores = rtCores
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

update

Character: "exponential" or "polynomial". Type of weight update. Default = "exponential"

min.update

Float: Minimum update for gradient step

min.hessian

[gS] Float: Minimum second derivative to continue splitting. Default = .001

min.membership

Integer: Minimum number of cases in a node. Default = 1

steps.past.min.membership

Integer: N steps to make past min.membership - For testing. Default = 0

gamma

[gS] Float: acceleration factor = lambda/(1 + lambda). Default = .8

max.depth

[gS] Integer: maximum depth of the tree. Default = 30

learning.rate

[gS] learning rate for the Newton Raphson step that updates the function values of the node

ifw

Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If weights are provided, ifw is not used.

ifw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

downsample

Logical: If TRUE, downsample majority class to match size of minority class

resample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

imetrics

Logical: If TRUE, save interpretability metrics, i.e. N total nodes in tree and depth, in output. Default = TRUE

grid.resample.params

List: Output of setup.resample defining grid search parameters.

metric

Character: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run.

rpart.params

List: rpart parameters, passed to rpart::rpart("parms")

match.rules

Logical: If TRUE, match cases to rules to get statistics per node, i.e. what percent of cases match each rule. If available, these are used by dplot3_addtree when plotting. Default = TRUE

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

question

Character: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

prune.verbose

Logical: If TRUE, prune tree.

trace

Integer: 0, 1, 2. The higher the number, the more verbose the output.

grid.verbose

Logical: Passed to gridSearchLearn

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

save.rpart

Logical: passed to addtree

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

n.cores

Integer: Number of cores to use.

Details

This function is for binary classification. The outcome must be a factor with two levels, the first level is the 'positive' class. Ensure there are no missing values in the data and that variables are either numeric (including integers) or factors. Use preprocess as needed to impute and convert characters to factors.

Factor levels should not contain the "/" character (it is used to separate conditions in the addtree object)

[gS] Indicates that more than one value can be supplied, which will result in grid search using internal resampling lambda = gamma/(1 - gamma)

Value

Object of class rtMod

Author(s)

E.D. Gennatas

References

Jose Marcio Luna, Efstathios D Gennatas, Lyle H Ungar, Eric Eaton, Eric S Diffenderfer, Shane T Jensen, Charles B Simone, Jerome H Friedman, Timothy D Solberg, Gilmer Valdes Building more accurate decision trees with the additive tree Proc Natl Acad Sci U S A. 2019 Oct 1;116(40):19887-19893. doi: 10.1073/pnas.1816748116

See Also

Other Supervised Learning: s_AdaBoost(), s_BART(), s_BRUTO(), s_BayesGLM(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GAM(), s_GBM(), s_GLM(), s_GLMNET(), s_GLMTree(), s_GLS(), s_H2ODL(), s_H2OGBM(), s_H2ORF(), s_HAL(), s_Isotonic(), s_KNN(), s_LDA(), s_LM(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MARS(), s_MLRF(), s_NBayes(), s_NLA(), s_NLS(), s_NW(), s_PPR(), s_PolyMARS(), s_QDA(), s_QRNN(), s_RF(), s_RFSRC(), s_Ranger(), s_SDA(), s_SGD(), s_SPLS(), s_SVM(), s_TFN(), s_XGBoost(), s_XRF()

Other Tree-based methods: s_AdaBoost(), s_BART(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GBM(), s_GLMTree(), s_H2OGBM(), s_H2ORF(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MLRF(), s_RF(), s_RFSRC(), s_Ranger(), s_XGBoost(), s_XRF()

Other Interpretable models: s_C50(), s_CART(), s_GLM(), s_GLMNET(), s_GLMTree(), s_LMTree()


egenn/rtemis documentation built on Dec. 17, 2024, 6:16 p.m.