s_RF: Random Forest Classification and Regression (C, R)

View source: R/s_RF.R

s_RFR Documentation

Random Forest Classification and Regression (C, R)

Description

Train a Random Forest for regression or classification using randomForest

Usage

s_RF(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  x.name = NULL,
  y.name = NULL,
  n.trees = 1000,
  autotune = FALSE,
  n.trees.try = 1000,
  stepFactor = 1.5,
  mtry = NULL,
  nodesize = NULL,
  maxnodes = NULL,
  mtryStart = mtry,
  grid.resample.params = setup.resample("kfold", 5),
  metric = NULL,
  maximize = NULL,
  classwt = NULL,
  ifw = TRUE,
  ifw.type = 2,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  importance = TRUE,
  proximity = FALSE,
  replace = TRUE,
  strata = NULL,
  sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)),
  sampsize.ratio = NULL,
  do.trace = NULL,
  tune.do.trace = FALSE,
  imetrics = FALSE,
  n.cores = rtCores,
  print.tune.plot = FALSE,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  proximity.tsne = FALSE,
  discard.forest = FALSE,
  tsne.perplexity = 5,
  plot.tsne.train = FALSE,
  plot.tsne.test = FALSE,
  question = NULL,
  verbose = TRUE,
  grid.verbose = verbose,
  outdir = NULL,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
  ...
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

n.trees

Integer: Number of trees to grow. Default = 1000

autotune

Logical: If TRUE, use randomForest::tuneRF to determine mtry

n.trees.try

Integer: Number of trees to train for tuning, if autotune = TRUE

stepFactor

Float: If autotune = TRUE, at each tuning iteration, mtry is multiplied or divided by this value. Default = 1.5

mtry

[gS] Integer: Number of features sampled randomly at each split

nodesize

[gS]: Integer: Minimum size of terminal nodes. Default = 5 (Regression); 1 (Classification)

maxnodes

[gS]: Integer: Maximum number of terminal nodes in a tree. Default = NULL; trees grown to maximum possible

mtryStart

Integer: If autotune = TRUE, start at this value for mtry

grid.resample.params

List: Output of setup.resample defining grid search parameters.

metric

Character: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run.

classwt

Vector, Float: Priors of the classes for classification only. Need not add up to 1

ifw

Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If weights are provided, ifw is not used.

ifw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)

upsample

Logical: If TRUE, upsample training set cases not belonging in majority outcome group

downsample

Logical: If TRUE, downsample majority class to match size of minority class

resample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

importance

Logical: If TRUE, estimate variable relative importance.

proximity

Logical: If TRUE, calculate proximity measure among cases.

replace

Logical: If TRUE, sample cases with replacement during training.

strata

Vector, Factor: Will be used for stratified sampling

sampsize

Integer: Size of sample to draw. In Classification, if strata is defined, this can be a vector of the same length, in which case, corresponding values determine how many cases are drawn from the strata.

sampsize.ratio

Float (0, 1): Heuristic of sorts to increase sensitivity in unbalanced cases. Sample with replacement from minority case to create bootstraps of length N cases. Select ⁠(sampsize.ratio * N minority cases)⁠ cases from majority class.

do.trace

Logical or integer: If TRUE, randomForest will outpout information while it is running. If an integer, randomForest will report progress every this many trees. Default = n.trees/10 if verbose = TRUE

tune.do.trace

Same as do.trace but for tuning, when autotune = TRUE

imetrics

Logical: If TRUE, calculate interpretability metrics (N of trees and N of nodes) and save under the extra field of rtMod

n.cores

Integer: Number of cores to use.

print.tune.plot

Logical: passed to randomForest::tuneRF.

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

proximity.tsne

Logical: If TRUE, perform t-SNE on proximity matrix. Will be saved under 'extra' field of rtMod. Default = FALSE

discard.forest

Logical: If TRUE, remove forest from rtMod object to save space. Default = FALSE

tsne.perplexity

Numeric: Perplexity parameter for Rtsne::Rtsne

plot.tsne.train

Logical: If TRUE, plot training set tSNE projections

plot.tsne.test

Logical: If TRUE, plot testing set tSNE projections

question

Character: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

grid.verbose

Logical: Passed to gridSearchLearn

outdir

String, Optional: Path to directory to save output

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

...

Additional arguments to be passed to randomForest::randomForest

Details

If autotue = TRUE, randomForest::tuneRF will be run to determine best mtry value.

Value

rtMod object

Author(s)

E.D. Gennatas

See Also

train_cv for external cross-validation

Other Supervised Learning: s_AdaBoost(), s_AddTree(), s_BART(), s_BRUTO(), s_BayesGLM(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GAM(), s_GBM(), s_GLM(), s_GLMNET(), s_GLMTree(), s_GLS(), s_H2ODL(), s_H2OGBM(), s_H2ORF(), s_HAL(), s_KNN(), s_LDA(), s_LM(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MARS(), s_MLRF(), s_NBayes(), s_NLA(), s_NLS(), s_NW(), s_PPR(), s_PolyMARS(), s_QDA(), s_QRNN(), s_RFSRC(), s_Ranger(), s_SDA(), s_SGD(), s_SPLS(), s_SVM(), s_TFN(), s_XGBoost(), s_XRF()

Other Tree-based methods: s_AdaBoost(), s_AddTree(), s_BART(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GBM(), s_GLMTree(), s_H2OGBM(), s_H2ORF(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MLRF(), s_RFSRC(), s_Ranger(), s_XGBoost(), s_XRF()

Other Ensembles: s_AdaBoost(), s_GBM(), s_Ranger()


egenn/rtemis documentation built on Oct. 28, 2024, 6:30 a.m.