s_LightRuleFit: RuleFit with LightGBM (C, R)

View source: R/s_LightRuleFit.R

s_LightRuleFitR Documentation

RuleFit with LightGBM (C, R)

Description

Train a LightGBM gradient boosting model, extract rules, and fit using LASSO

Usage

s_LightRuleFit(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  lgbm.mod = NULL,
  n_trees = 200,
  num_leaves = 32L,
  max_depth = 4,
  learning_rate = 0.1,
  subsample = 0.666,
  subsample_freq = 1L,
  lambda_l1 = 0,
  lambda_l2 = 0,
  objective = NULL,
  importance = FALSE,
  lgbm.ifw = TRUE,
  lgbm.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5),
  glmnet.ifw = TRUE,
  alpha = 1,
  lambda = NULL,
  glmnet.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5),
  grid.resample.params = setup.resample("kfold", 5),
  gridsearch.type = "exhaustive",
  metric = NULL,
  maximize = NULL,
  grid.verbose = FALSE,
  save.gridrun = FALSE,
  weights = NULL,
  empirical_risk = TRUE,
  cases_by_rules = NULL,
  save_cases_by_rules = FALSE,
  x.name = NULL,
  y.name = NULL,
  n.cores = rtCores,
  question = NULL,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  outdir = NULL,
  save.mod = if (!is.null(outdir)) TRUE else FALSE,
  verbose = TRUE,
  trace = 0,
  .gs = FALSE
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

lgbm.mod

rtMod object created by s_LightGBM. If provided, the gradient boosting step is skipped.

num_leaves

Integer: [gS] Maximum tree leaves for base learners.

max_depth

Integer: [gS] Maximum tree depth for base learners, <=0 means no limit.

learning_rate

Numeric: [gS] Boosting learning rate

subsample

Numeric: [gS] Subsample ratio of the training set.

subsample_freq

Integer: Subsample every this many iterations

lambda_l1

Numeric: [gS] L1 regularization term

lambda_l2

Numeric: [gS] L2 regularization term

objective

(Default = NULL)

importance

Logical: If TRUE, calculate variable importance

alpha

[gS] Float [0, 1]: The elasticnet mixing parameter: a = 0 is the ridge penalty, a = 1 is the lasso penalty

lambda

[gS] Float vector: Best left to NULL, cv.glmnet will compute its own lambda sequence

grid.resample.params

List: Output of setup.resample defining grid search parameters.

gridsearch.type

Character: Type of grid search to perform: "exhaustive" or "randomized".

metric

Character: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run.

grid.verbose

Logical: Passed to gridSearchLearn

save.gridrun

Logical: If TRUE, save all grid search models

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

empirical_risk

Logical: If TRUE, calculate empirical risk

cases_by_rules

Matrix of cases by rules from a previoue rulefit run. If provided, the GBM step is skipped.

save_cases_by_rules

Logical: If TRUE, save cases_by_rules to object

x.name

Character: Name for feature set

y.name

Character: Name for outcome

n.cores

Integer: Number of cores to use

question

Character: the question you are attempting to answer with this model, in plain language.

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

verbose

Logical: If TRUE, print summary to screen.

trace

Integer: Verbosity level

.gs

(Internal use only)

Details

Based on "Predictive Learning via Rule Ensembles" by Friedman and Popescu http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf

Value

rtMod object

Author(s)

E.D. Gennatas

References

Friedman JH, Popescu BE, "Predictive Learning via Rule Ensembles", http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf


egenn/rtemis documentation built on Dec. 17, 2024, 6:16 p.m.