ezr.h2o_grid_xgb: Xgboost Grid
In jmp1989/easyr: Helpful wrappers for common EDA, Data Manipulation, & Modeling

Description Usage Arguments Details Value

Xgboost Grid Search. Allows for pre-screening an xgboost model to eliminate features and then following up with an xgboost model of hyper parameters. There are preset values for some of the hyper parameters, but others should be added as desired... especially , reg_alpha, min_child_weight.

ezr.h2o_grid_xgb(train_df, valid_df = NULL, xvars = names(train_df),
  yvar = "target", grid_id = "xgb_grid", prescreenxgbm = TRUE,
  novalid_ok = FALSE, prescreen_keepvars_criteria = "number",
  prescreen_keepvars_threshold = 30, xval = TRUE, folds = 5,
  keep_cross_validation_predictions = FALSE, max_models = 1,
  learnrate = c(0.025), max_min_runtime = 15, ntrees = c(125),
  seed = 2018, max_depth = c(3, 5, 7, 9), colsample_bytree = c(1,
  0.5, 0.8), sample_rate = c(1, 0.8, 0.6), gamma = c(0, 1),
  reg_lambda = c(0, 0.5, 0.25), ...)

`train_df`	Training dataframe
`valid_df`	If not provided, the training dataframe is split for you 80/20
`xvars`	The xvariables in the model
`yvar`	The target variable
`grid_id`	Name of Grid ID
`prescreenxgbm`	Use a prescreen? This will run an xgb model and then from this a selected number of features will be chosen to run in final model. This is intended to help speed up modeling process and to avoid modeling with obviously worthless data.
`novalid_ok`	Run a model just the training dataset only.
`prescreen_keepvars_criteria`	Values are percent or number. Percent picks variables that contribute at least ___ percent, which is set at 0.005 by default. Number picks the top N best variables
`prescreen_keepvars_threshold`	What is the percentage threshold or integer number to keep if you use a prescreen model?
`xval`	Cross validation, TRUE/FALSE
`folds`	# of Folds if you use cross validation
`keep_cross_validation_predictions`	, Keep the predictions? Defaults to false.
`max_models`	Defaults to 1
`learnrate`	Defaults to 0.025
`max_min_runtime`	Defaults to 15min. Remember to enter this as minutes, not seconds.
`ntrees`	Defaults to 125
`seed`	Defaults to 2018
`max_depth`	Defaults to a grid search of 3,5,7,9
`colsample_bytree`	Default values
`sample_rate`	Default values
`gamma`	Please tune
`reg_lambda`	This is L2 regularization. L1 is reg_alpha, please pass in under ...
`...`	Hyper parameters