ezr.h2o_gbm_grid: GBM Grid Search

Usage Arguments


ezr.h2o_gbm_grid(train_df, valid_df = NULL, xvars = names(train_df),
  yvar = "target", grid_id = "gbm_grid", prescreengbm = TRUE,
  novalid_ok = FALSE, prescreen_keepvars_criteria = "percent",
  prescreen_keepvars_threshold = 0.005, xval = TRUE, folds = 5,
  keep_cross_validation_predictions = TRUE, max_models = 1,
  learnrate = 0.025, max_min_runtime = 15, ntrees = 125,
  seed = 2018, ...)



h2o dataframe


a validation dataframe. Default is NULL. If NULL it the train_df will be split into 80/20 split and the 20


xvarsdefault is everything in training df




grid_idgrid id to use. Default is gbm_grid


prescreengbmDefault is TRUE. Should a pre-screen be run to eliminate excess variables? This will run a gbm with default params, and be used to eliminate variables before re-training. This is to prevent against 100s of variables with 0.001 or similar importance criteria in model.


prescreen_keepvars_criteriaValid values are 'percent' and 'number' Default is 'percent' importance. Number refers to how many variables such as 5/10/100


prescreen_keepvars_thresholdDefault threshold is 0.01 for percent for retention. Enter an integer for 'count'. If the value is <= 1 and the <prescreen_keepvars_criteria> is equal to 'number' then this will default to 25.


xvalDefault is TRUE.


foldsDefault is 5


keep_cross_validation_predictionsDefault is FALSE


max_modelsDefault is 1. If value is 1, then a default GBM will run


learnrateDefault is 0.05. You can enter a vector c(0.01, 0.05)


max_min_runtimeHow many minutes can this run for? Default is 15min


ntreesDefault is 100.


seedDefault is 2018


...Additional inputs...


notvalid_okFALSE by default. If TRUE, then there is no validation dataset when only training dataset is entered.

Returns a grid of models Off the shelf grid search for GBM w/ hyper parameters. library(h2o) h2o.init() h2odf = as.h2o(dataset_telco_churn_from_kaggle) example_grid_search=ezr.h2o_gbm_grid(train_df = h2odf, yvar='Churn', max_models = 11)

jmp1989/easyr documentation built on May 20, 2019, 7:25 a.m.