s_LightRuleFit: RuleFit with LightGBM (C, R)
In egenn/rtemis: Machine Learning and Visualization

s_LightRuleFit

R Documentation

RuleFit with LightGBM (C, R)

Description

Train a LightGBM gradient boosting model, extract rules, and fit using LASSO

Usage

s_LightRuleFit(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  lgbm.mod = NULL,
  n_trees = 200,
  num_leaves = 32L,
  max_depth = 4,
  learning_rate = 0.1,
  subsample = 0.666,
  subsample_freq = 1L,
  lambda_l1 = 0,
  lambda_l2 = 0,
  objective = NULL,
  importance = FALSE,
  lgbm.ifw = TRUE,
  lgbm.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5),
  glmnet.ifw = TRUE,
  alpha = 1,
  lambda = NULL,
  glmnet.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5),
  grid.resample.params = setup.resample("kfold", 5),
  gridsearch.type = "exhaustive",
  metric = NULL,
  maximize = NULL,
  grid.verbose = FALSE,
  save.gridrun = FALSE,
  weights = NULL,
  empirical_risk = TRUE,
  cases_by_rules = NULL,
  save_cases_by_rules = FALSE,
  x.name = NULL,
  y.name = NULL,
  n.cores = rtCores,
  question = NULL,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  outdir = NULL,
  save.mod = if (!is.null(outdir)) TRUE else FALSE,
  verbose = TRUE,
  trace = 0,
  .gs = FALSE
)

Arguments

`x`	Numeric vector or matrix / data frame of features i.e. independent variables
`y`	Numeric vector of outcome, i.e. dependent variable
`x.test`	Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in `x`
`y.test`	Numeric vector of testing set outcome
`lgbm.mod`	rtMod object created by s_LightGBM. If provided, the gradient boosting step is skipped.
`num_leaves`	Integer: [gS] Maximum tree leaves for base learners.
`max_depth`	Integer: [gS] Maximum tree depth for base learners, <=0 means no limit.
`learning_rate`	Numeric: [gS] Boosting learning rate
`subsample`	Numeric: [gS] Subsample ratio of the training set.
`subsample_freq`	Integer: Subsample every this many iterations
`lambda_l1`	Numeric: [gS] L1 regularization term
`lambda_l2`	Numeric: [gS] L2 regularization term
`objective`	(Default = NULL)
`importance`	Logical: If `TRUE`, calculate variable importance
`alpha`	[gS] Float [0, 1]: The elasticnet mixing parameter: `a = 0` is the ridge penalty, `a = 1` is the lasso penalty
`lambda`	[gS] Float vector: Best left to NULL, `cv.glmnet` will compute its own lambda sequence
`grid.resample.params`	List: Output of setup.resample defining grid search parameters.
`gridsearch.type`	Character: Type of grid search to perform: "exhaustive" or "randomized".
`metric`	Character: Metric to minimize, or maximize if `maximize = TRUE` during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.
`maximize`	Logical: If TRUE, `metric` will be maximized if grid search is run.
`grid.verbose`	Logical: Passed to `gridSearchLearn`
`save.gridrun`	Logical: If `TRUE`, save all grid search models
`weights`	Numeric vector: Weights for cases. For classification, `weights` takes precedence over `ifw`, therefore set `weights = NULL` if using `ifw`. Note: If `weight` are provided, `ifw` is not used. Leave NULL if setting `ifw = TRUE`.
`empirical_risk`	Logical: If TRUE, calculate empirical risk
`cases_by_rules`	Matrix of cases by rules from a previoue rulefit run. If provided, the GBM step is skipped.
`save_cases_by_rules`	Logical: If TRUE, save cases_by_rules to object
`x.name`	Character: Name for feature set
`y.name`	Character: Name for outcome
`n.cores`	Integer: Number of cores to use
`question`	Character: the question you are attempting to answer with this model, in plain language.
`print.plot`	Logical: if TRUE, produce plot using `mplot3` Takes precedence over `plot.fitted` and `plot.predicted`.
`plot.fitted`	Logical: if TRUE, plot True (y) vs Fitted
`plot.predicted`	Logical: if TRUE, plot True (y.test) vs Predicted. Requires `x.test` and `y.test`
`plot.theme`	Character: "zero", "dark", "box", "darkbox"
`outdir`	Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if `save.mod` is TRUE
`save.mod`	Logical: If TRUE, save all output to an RDS file in `outdir` `save.mod` is TRUE by default if an `outdir` is defined. If set to TRUE, and no `outdir` is defined, outdir defaults to `paste0("./s.", mod.name)`
`verbose`	Logical: If TRUE, print summary to screen.
`trace`	Integer: Verbosity level
`.gs`	(Internal use only)