ezr.h2o_gbm_grid: GBM Grid Search
In jmp1989/easyr: Helpful wrappers for common EDA, Data Manipulation, & Modeling

Usage Arguments

ezr.h2o_gbm_grid(train_df, valid_df = NULL, xvars = names(train_df),
  yvar = "target", grid_id = "gbm_grid", prescreengbm = TRUE,
  novalid_ok = FALSE, prescreen_keepvars_criteria = "percent",
  prescreen_keepvars_threshold = 0.005, xval = TRUE, folds = 5,
  keep_cross_validation_predictions = TRUE, max_models = 1,
  learnrate = 0.025, max_min_runtime = 15, ntrees = 125,
  seed = 2018, ...)

train_df

h2o dataframe

valid_df

a validation dataframe. Default is NULL. If NULL it the train_df will be split into 80/20 split and the 20

\item

xvarsdefault is everything in training df

\item

yvartarget

\item

grid_idgrid id to use. Default is gbm_grid

\item

prescreengbmDefault is TRUE. Should a pre-screen be run to eliminate excess variables? This will run a gbm with default params, and be used to eliminate variables before re-training. This is to prevent against 100s of variables with 0.001 or similar importance criteria in model.

\item

prescreen_keepvars_criteriaValid values are 'percent' and 'number' Default is 'percent' importance. Number refers to how many variables such as 5/10/100

\item

prescreen_keepvars_thresholdDefault threshold is 0.01 for percent for retention. Enter an integer for 'count'. If the value is <= 1 and the <prescreen_keepvars_criteria> is equal to 'number' then this will default to 25.

\item

xvalDefault is TRUE.

\item

foldsDefault is 5

\item

keep_cross_validation_predictionsDefault is FALSE

\item

max_modelsDefault is 1. If value is 1, then a default GBM will run

\item

learnrateDefault is 0.05. You can enter a vector c(0.01, 0.05)

\item

max_min_runtimeHow many minutes can this run for? Default is 15min

\item

ntreesDefault is 100.

\item

seedDefault is 2018

\item

...Additional inputs...

\item

notvalid_okFALSE by default. If TRUE, then there is no validation dataset when only training dataset is entered.

Returns a grid of models Off the shelf grid search for GBM w/ hyper parameters. library(h2o) h2o.init() h2odf = as.h2o(dataset_telco_churn_from_kaggle) example_grid_search=ezr.h2o_gbm_grid(train_df = h2odf, yvar='Churn', max_models = 11)

jmp1989/easyr documentation built on May 20, 2019, 7:25 a.m.