s_LINAD: Linear Additive Tree (C, R)

View source: R/s_LINAD.R

s_LINADR Documentation

Linear Additive Tree (C, R)

Description

Train a Linear Additive Tree for Regression or Binary Classification

Usage

s_LINAD(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  weights = NULL,
  max.leaves = 20,
  lookback = TRUE,
  force.max.leaves = NULL,
  learning.rate = 0.5,
  ifw = TRUE,
  ifw.type = 1,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  leaf.model = c("line", "spline"),
  gamlearner = "gamsel",
  gam.params = list(),
  nvmax = 3,
  gamma = 0.5,
  gamma.on.lin = FALSE,
  lin.type = c("glmnet", "forwardStepwise", "cv.glmnet", "lm.ridge", "allSubsets",
    "backwardStepwise", "glm", "solve", "none"),
  first.lin.type = "cv.glmnet",
  first.lin.learning.rate = 1,
  first.lin.alpha = 1,
  first.lin.lambda = NULL,
  cv.glmnet.nfolds = 5,
  which.cv.glmnet.lambda = "lambda.min",
  alpha = 1,
  lambda = 0.05,
  lambda.seq = NULL,
  minobsinnode.lin = 10,
  part.minsplit = 2,
  part.xval = 0,
  part.max.depth = 1,
  part.cp = 0,
  part.minbucket = 1,
  .rho = TRUE,
  rho.max = 1000,
  init = NULL,
  metric = "auto",
  maximize = NULL,
  grid.resample.params = setup.resample("kfold", 5),
  gridsearch.type = "exhaustive",
  save.gridrun = FALSE,
  select.leaves.smooth = FALSE,
  cluster = FALSE,
  keep.x = FALSE,
  simplify = TRUE,
  cxrcoef = FALSE,
  n.cores = rtCores,
  .preprocess = NULL,
  verbose = TRUE,
  grid.verbose = FALSE,
  plot.tuning = FALSE,
  verbose.predict = FALSE,
  trace = 1,
  x.name = NULL,
  y.name = NULL,
  question = NULL,
  outdir = NULL,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  save.mod = FALSE,
  .gs = FALSE
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

max.leaves

Integer: Maximum number of terminal nodes to grow. Setting this to a value > 1, triggers cross-validation to find best number of leaves. To force a given number of leaves and not cross-validate, set force.max.leaves to any (integer) value.

lookback

Logical: If TRUE, use validation error to decide best number of leaves to use.

force.max.leaves

Integer: If set, max.leaves is ignored and the tree will attempt to reach this number of leaves, without performing tuning number of leaves.

learning.rate

[gS] Numeric: learning rate for steps after initial linear model

ifw

Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If weights are provided, ifw is not used.

ifw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

downsample

Logical: If TRUE, downsample majority class to match size of minority class

resample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

nvmax

[gS] Integer: Number of max features to use for lin.type "allSubsets", "forwardStepwise", or "backwardStepwise". If values greater than n of features in x are provided, they will be excluded

gamma

[gS] Numeric: Soft weighting parameter. Weights of cases that do not belong to node get multiplied by this amount

lin.type

Character: One of "glmnet", "forwardStepwise", "cv.glmnet", "lm.ridge", "allSubsets", "backwardStepwise", "glm", "solve", or "none" to not fit linear models See lincoef for more

first.lin.type

Character: same options as lin.type, the first linear model to fit on the root node.

first.lin.alpha

Numeric: alpha for the first linear model, if first.lin.type is "glmnet" or "cv.glmnet"

lambda

[gS] Numeric: lambda value for lin.type glmnet, cv.glmnet, lm.ridge

minobsinnode.lin

[gS] Integer: Minimum number of observation needed to fit linear model

part.minsplit

[gS] Integer: Minimum number of observations in node to consider splitting

part.max.depth

Integer: Max depth for each tree model within the additive tree

part.cp

[gS] Numeric: Split must decrease complexity but at least this much to be considered

part.minbucket

[gS] Integer: Minimum number of observations allowed in child node to allow splitting

init

Initial value. Default = mean(y)

verbose

Logical: If TRUE, print summary to screen.

plot.tuning

Logical: If TRUE, plot validation error during gridsearch

trace

Integer: If higher than 0, will print more information to the console.

x.name

Character: Name for feature set

y.name

Character: Name for outcome

question

Character: the question you are attempting to answer with this model, in plain language.

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

.gs

internal use only

Details

The Linear Additive Tree trains a tree using a sequence of regularized linear models and splits. We specify an upper threshold of leaves using max.leaves instead of directly defining a number, because depending on the other parameters and the datasets, splitting may stop early.

[gS] indicates tunable hyperparameters that can accept a vector of possible values

Author(s)

E.D. Gennatas


egenn/rtemis documentation built on Oct. 28, 2024, 6:30 a.m.