s_Ranger: Random Forest Classification and Regression (C, R)
In egenn/rtemis: Machine Learning and Visualization

s_Ranger

R Documentation

Random Forest Classification and Regression (C, R)

Description

Train a Random Forest for regression or classification using ranger

Usage

s_Ranger(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  x.name = NULL,
  y.name = NULL,
  n.trees = 1000,
  weights = NULL,
  ifw = TRUE,
  ifw.type = 2,
  ifw.case.weights = TRUE,
  ifw.class.weights = FALSE,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  autotune = FALSE,
  classwt = NULL,
  n.trees.try = 500,
  stepFactor = 2,
  mtry = NULL,
  mtryStart = NULL,
  inbag.resample = NULL,
  stratify.on.y = FALSE,
  grid.resample.params = setup.resample("kfold", 5),
  gridsearch.type = c("exhaustive", "randomized"),
  gridsearch.randomized.p = 0.1,
  metric = NULL,
  maximize = NULL,
  probability = NULL,
  importance = "impurity",
  local.importance = FALSE,
  replace = TRUE,
  min.node.size = NULL,
  splitrule = NULL,
  strata = NULL,
  sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)),
  tune.do.trace = FALSE,
  imetrics = FALSE,
  n.cores = rtCores,
  print.tune.plot = FALSE,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  question = NULL,
  grid.verbose = verbose,
  verbose = TRUE,
  outdir = NULL,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
  ...
)

Arguments

`x`	Numeric vector or matrix / data frame of features i.e. independent variables
`y`	Numeric vector of outcome, i.e. dependent variable
`x.test`	Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in `x`
`y.test`	Numeric vector of testing set outcome
`x.name`	Character: Name for feature set
`y.name`	Character: Name for outcome
`n.trees`	Integer: Number of trees to grow. Default = 1000
`weights`	Numeric vector: Weights for cases. For classification, `weights` takes precedence over `ifw`, therefore set `weights = NULL` if using `ifw`. Note: If `weight` are provided, `ifw` is not used. Leave NULL if setting `ifw = TRUE`.
`ifw`	Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If `weights` are provided, `ifw` is not used.
`ifw.type`	Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)
`ifw.case.weights`	Logical: If TRUE, define ranger's `case.weights` using IPW. Default = TRUE Note: Cannot use case.weights together with `stratify.on.y` or `inbag.resample`
`ifw.class.weights`	Logical: If TRUE, define ranger's `class.weights` using IPW. Default = FALSE
`upsample`	Logical: If TRUE, upsample training set cases not belonging in majority outcome group
`downsample`	Logical: If TRUE, downsample majority class to match size of minority class
`resample.seed`	Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
`autotune`	Logical: If TRUE, use `randomForest::tuneRF` to determine `mtry`
`classwt`	Vector, Float: Priors of the classes for `randomForest::tuneRF` if `autotune = TRUE`. For classification only; need not add up to 1
`n.trees.try`	Integer: Number of trees to train for tuning, if `autotune = TRUE`
`stepFactor`	Float: If `autotune = TRUE`, at each tuning iteration, `mtry` is multiplied or divided by this value. Default = 1.5
`mtry`	[gS] Integer: Number of features sampled randomly at each split. Defaults to square root of n of features for classification, and a third of n of features for regression.
`mtryStart`	Integer: If `autotune = TRUE`, start at this value for `mtry`
`inbag.resample`	List, length `n.tree`: Output of setup.resample to define resamples used for each tree. Default = NULL
`stratify.on.y`	Logical: If TRUE, overrides `inbag.resample` to use stratified bootstraps for each tree. This can help improve test set performance in imbalanced datasets. Default = FALSE. Note: Cannot be used with `ifw.case.weights`
`grid.resample.params`	List: Output of setup.resample defining grid search parameters.
`gridsearch.type`	Character: Type of grid search to perform: "exhaustive" or "randomized".
`gridsearch.randomized.p`	Float (0, 1): If `gridsearch.type = "randomized"`, randomly test this proportion of combinations.
`metric`	Character: Metric to minimize, or maximize if `maximize = TRUE` during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.
`maximize`	Logical: If TRUE, `metric` will be maximized if grid search is run.
`probability`	Logical: If TRUE, grow a probability forest. See `ranger::ranger`. Default = FALSE
`importance`	Character: "none", "impurity", "impurity_corrected", or "permutation" Default = "impurity"
`local.importance`	Logical: If TRUE, return local importance values. Only applicable if `importance` is set to "permutation".
`replace`	Logical: If TRUE, sample cases with replacement during training.
`min.node.size`	[gS] Integer: Minimum node size
`splitrule`	Character: For classification: "gini" (Default) or "extratrees"; For regression: "variance" (Default), "extratrees" or "maxstat". For survival "logrank" (Default), "extratrees", "C" or "maxstat".
`strata`	Vector, Factor: Will be used for stratified sampling
`sampsize`	Integer: Size of sample to draw. In Classification, if `strata` is defined, this can be a vector of the same length, in which case, corresponding values determine how many cases are drawn from the strata.
`tune.do.trace`	Same as `do.trace` but for tuning, when `autotune = TRUE`
`imetrics`	Logical: If TRUE, calculate interpretability metrics (N of trees and N of nodes) and save under the `extra` field of `rtMod`
`n.cores`	Integer: Number of cores to use.
`print.tune.plot`	Logical: passed to `randomForest::tuneRF`.
`print.plot`	Logical: if TRUE, produce plot using `mplot3` Takes precedence over `plot.fitted` and `plot.predicted`.
`plot.fitted`	Logical: if TRUE, plot True (y) vs Fitted
`plot.predicted`	Logical: if TRUE, plot True (y.test) vs Predicted. Requires `x.test` and `y.test`
`plot.theme`	Character: "zero", "dark", "box", "darkbox"
`question`	Character: the question you are attempting to answer with this model, in plain language.
`grid.verbose`	Logical: Passed to `gridSearchLearn`
`verbose`	Logical: If TRUE, print summary to screen.
`outdir`	String, Optional: Path to directory to save output
`save.mod`	Logical: If TRUE, save all output to an RDS file in `outdir` `save.mod` is TRUE by default if an `outdir` is defined. If set to TRUE, and no `outdir` is defined, outdir defaults to `paste0("./s.", mod.name)`
`...`	Additional arguments to be passed to `ranger::ranger`

Details

You should cconsider, or try, setting mtry to NCOL(x), especially for small number of features. By default mtry is set to NCOL(x) for NCOL(x) <= 20. For imbalanced datasets, setting stratify.on.y = TRUE should improve performance. If autotune = TRUE, randomForest::tuneRF will be run to determine best mtry value. [gS]: indicated parameter will be tuned by grid search if more than one value is passed

See Tech Report comparing balanced (ifw.case.weights = TRUE) and weighted (ifw.class.weights = TRUE) Random Forests.

Value

rtMod object

Author(s)

E.D. Gennatas

egenn/rtemis
Machine Learning and Visualization

s_Ranger: Random Forest Classification and Regression (C, R)
In egenn/rtemis: Machine Learning and Visualization

Random Forest Classification and Regression (C, R)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to s_Ranger in egenn/rtemis...

R Package Documentation

Browse R Packages

We want your feedback!

egenn/rtemis Machine Learning and Visualization

s_Ranger: Random Forest Classification and Regression (C, R) In egenn/rtemis: Machine Learning and Visualization

Random Forest Classification and Regression (C, R)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to s_Ranger in egenn/rtemis...

R Package Documentation

Browse R Packages

We want your feedback!

egenn/rtemis
Machine Learning and Visualization

s_Ranger: Random Forest Classification and Regression (C, R)
In egenn/rtemis: Machine Learning and Visualization