s_LightRF: Random Forest using LightGBM

View source: R/s_LightGBM.R

s_LightRFR Documentation

Random Forest using LightGBM

Description

Random Forest using LightGBM

Usage

s_LightRF(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  x.name = NULL,
  y.name = NULL,
  weights = NULL,
  ifw = TRUE,
  ifw.type = 2,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  objective = NULL,
  nrounds = 500L,
  early_stopping_rounds = -1L,
  num_leaves = 4096L,
  max_depth = -1L,
  learning_rate = 1,
  feature_fraction = 1,
  subsample = 0.623,
  subsample_freq = 1L,
  lambda_l1 = 0,
  lambda_l2 = 0,
  max_cat_threshold = 32L,
  min_data_per_group = 32L,
  linear_tree = FALSE,
  tree_learner = "data_parallel",
  grid.resample.params = setup.resample("kfold", 5),
  gridsearch.type = "exhaustive",
  metric = NULL,
  maximize = NULL,
  importance = TRUE,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  question = NULL,
  verbose = TRUE,
  grid.verbose = FALSE,
  lightgbm_verbose = -1,
  save.gridrun = FALSE,
  n.cores = 1,
  n_threads = rtCores,
  force_col_wise = FALSE,
  force_row_wise = FALSE,
  outdir = NULL,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
  .gs = FALSE,
  ...
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

ifw

Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If weights are provided, ifw is not used.

ifw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

downsample

Logical: If TRUE, downsample majority class to match size of minority class

resample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

objective

(Default = NULL)

nrounds

Integer: Number of trees to grow

early_stopping_rounds

Integer: Training on resamples of x (tuning) will stop if performance does not improve for this many rounds

num_leaves

Integer: [gS] Maximum tree leaves for base learners.

max_depth

Integer: [gS] Maximum tree depth for base learners, <=0 means no limit.

learning_rate

Numeric: [gS] Boosting learning rate

feature_fraction

Numeric (0, 1): [gS] Fraction of features to consider at each iteration (i.e. tree)

subsample

Numeric: [gS] Subsample ratio of the training set.

subsample_freq

Integer: Subsample every this many iterations

lambda_l1

Numeric: [gS] L1 regularization term

lambda_l2

Numeric: [gS] L2 regularization term

max_cat_threshold

Integer: Max number of splits to consider for categorical variable

min_data_per_group

Integer: Minimum number of observations per categorical group

linear_tree

Logical: [gS] If TRUE, use linear trees

tree_learner

Character: [gS] "serial", "feature", "data", "voting"

grid.resample.params

List: Output of setup.resample defining grid search parameters.

gridsearch.type

Character: Type of grid search to perform: "exhaustive" or "randomized".

metric

Character: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run.

importance

Logical: If TRUE, calculate variable importance

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

question

Character: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

grid.verbose

Logical: Passed to gridSearchLearn

lightgbm_verbose

Integer: Passed to lightgbm::train, ⁠< 0⁠: Fatal, 0: Error (Warning), 1: Info, ⁠> 1⁠: Debug

save.gridrun

Logical: If TRUE, save all grid search models

n.cores

Integer: Number of cores to use.

n_threads

Integer: Number of threads for lightgbm using OpenMP. Only parallelize resamples using n.cores or the lightgbm execution using this setting.

force_col_wise

Logical: If TRUE, force column-wise histogram building (See https://lightgbm.readthedocs.io/en/latest/Parameters.html)

force_row_wise

Logical: If TRUE, force row-wise histogram building (See https://lightgbm.readthedocs.io/en/latest/Parameters.html)

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

.gs

(Internal use only)

...

Extra arguments appended to lgb.train's params.

Author(s)

ED Gennatas

Examples

## Not run: 
x <- rnormmat(500, 10)
y <- x[, 3] + .5 * x[, 5]^2 + rnorm(500)
dat <- data.frame(x, y)
mod <- s_LightRF(dat)

## End(Not run)

egenn/rtemis documentation built on Nov. 22, 2024, 4:12 a.m.