s_H2ORF: Random Forest on H2O (C, R)

View source: R/s_H2ORF.R

s_H2ORFR Documentation

Random Forest on H2O (C, R)

Description

Trains a Random Forest model using H2O (http://www.h2o.ai)

Usage

s_H2ORF(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  x.valid = NULL,
  y.valid = NULL,
  x.name = NULL,
  y.name = NULL,
  ip = "localhost",
  port = 54321,
  n.trees = 500,
  max.depth = 20,
  n.stopping.rounds = 0,
  mtry = -1,
  nfolds = 0,
  weights = NULL,
  balance.classes = TRUE,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  na.action = na.fail,
  h2o.shutdown.at.end = TRUE,
  n.cores = rtCores,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  question = NULL,
  verbose = TRUE,
  trace = 0,
  save.mod = FALSE,
  outdir = NULL,
  ...
)

Arguments

x

Training set features

y

Training set outcome

x.test

Testing set features (Used to evaluate model performance)

y.test

Testing set outcome

x.valid

Validation set features (Used to build model / tune hyperparameters)

y.valid

Validation set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

ip

Character: IP address of H2O server. Default = "localhost"

port

Integer: Port to connect to at ip

n.trees

Integer: Number of trees to grow

max.depth

Integer: Maximum tree depth

n.stopping.rounds

Integer: Early stopping if simple moving average of this many rounds does not improve. Set to 0 to disable early stopping.

mtry

Integer: Number of variables randomly sampled and considered for splitting at each round. If set to -1, defaults to sqrt(N_features) for classification and N_features/3 for regression.

nfolds

Integer: Number of folds for K-fold CV used by h2o.randomForest. Set to 0 to disable (included for experimentation only, use train_cv for outer resampling)

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

balance.classes

Logical: If TRUE, h2o.randomForest will over/undersample to balance data. (included for experimentation only)

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

downsample

Logical: If TRUE, downsample majority class to match size of minority class

resample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

na.action

How to handle missing values. See ?na.fail

h2o.shutdown.at.end

Logical: If TRUE, run h2o.shutdown(prompt = FALSE) after training is complete.

n.cores

Integer: Number of cores to use

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

question

Character: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

trace

Integer: If higher than 0, will print more information to the console.

save.mod

Logical: If TRUE, save all output to an RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

...

Additional parameters to pass to h2o::h2o.randomForest

Value

rtMod object

Author(s)

E.D. Gennatas

See Also

train_cv for external cross-validation

Other Supervised Learning: s_AdaBoost(), s_AddTree(), s_BART(), s_BRUTO(), s_BayesGLM(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GAM(), s_GBM(), s_GLM(), s_GLMNET(), s_GLMTree(), s_GLS(), s_H2ODL(), s_H2OGBM(), s_HAL(), s_KNN(), s_LDA(), s_LM(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MARS(), s_MLRF(), s_NBayes(), s_NLA(), s_NLS(), s_NW(), s_PPR(), s_PolyMARS(), s_QDA(), s_QRNN(), s_RF(), s_RFSRC(), s_Ranger(), s_SDA(), s_SGD(), s_SPLS(), s_SVM(), s_TFN(), s_XGBoost(), s_XRF()

Other Tree-based methods: s_AdaBoost(), s_AddTree(), s_BART(), s_C50(), s_CART(), s_CTree(), s_EVTree(), s_GBM(), s_GLMTree(), s_H2OGBM(), s_LMTree(), s_LightCART(), s_LightGBM(), s_MLRF(), s_RF(), s_RFSRC(), s_Ranger(), s_XGBoost(), s_XRF()


egenn/rtemis documentation built on Nov. 22, 2024, 4:12 a.m.