Description Usage Arguments Details Value
Xgboost Grid Search. Allows for pre-screening an xgboost model to eliminate features and then following up with an xgboost model of hyper parameters. There are preset values for some of the hyper parameters, but others should be added as desired... especially , reg_alpha, min_child_weight.
1 2 3 4 5 6 7 8 9 | ezr.h2o_grid_xgb(train_df, valid_df = NULL, xvars = names(train_df),
yvar = "target", grid_id = "xgb_grid", prescreenxgbm = TRUE,
novalid_ok = FALSE, prescreen_keepvars_criteria = "number",
prescreen_keepvars_threshold = 30, xval = TRUE, folds = 5,
keep_cross_validation_predictions = FALSE, max_models = 1,
learnrate = c(0.025), max_min_runtime = 15, ntrees = c(125),
seed = 2018, max_depth = c(3, 5, 7, 9), colsample_bytree = c(1,
0.5, 0.8), sample_rate = c(1, 0.8, 0.6), gamma = c(0, 1),
reg_lambda = c(0, 0.5, 0.25), ...)
|
train_df |
Training dataframe |
valid_df |
If not provided, the training dataframe is split for you 80/20 |
xvars |
The xvariables in the model |
yvar |
The target variable |
grid_id |
Name of Grid ID |
prescreenxgbm |
Use a prescreen? This will run an xgb model and then from this a selected number of features will be chosen to run in final model. This is intended to help speed up modeling process and to avoid modeling with obviously worthless data. |
novalid_ok |
Run a model just the training dataset only. |
prescreen_keepvars_criteria |
Values are percent or number. Percent picks variables that contribute at least ___ percent, which is set at 0.005 by default. Number picks the top N best variables |
prescreen_keepvars_threshold |
What is the percentage threshold or integer number to keep if you use a prescreen model? |
xval |
Cross validation, TRUE/FALSE |
folds |
# of Folds if you use cross validation |
keep_cross_validation_predictions |
, Keep the predictions? Defaults to false. |
max_models |
Defaults to 1 |
learnrate |
Defaults to 0.025 |
max_min_runtime |
Defaults to 15min. Remember to enter this as minutes, not seconds. |
ntrees |
Defaults to 125 |
seed |
Defaults to 2018 |
max_depth |
Defaults to a grid search of 3,5,7,9 |
colsample_bytree |
Default values |
sample_rate |
Default values |
gamma |
Please tune |
reg_lambda |
This is L2 regularization. L1 is reg_alpha, please pass in under ... |
... |
Hyper parameters |
Hyper parameters should be tuned! The ones preset to search over are available for convience only.
A grid searched models
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.