ezr.h2o_grid_xgb: Xgboost Grid

Description Usage Arguments Details Value

Description

Xgboost Grid Search. Allows for pre-screening an xgboost model to eliminate features and then following up with an xgboost model of hyper parameters. There are preset values for some of the hyper parameters, but others should be added as desired... especially , reg_alpha, min_child_weight.

Usage

1
2
3
4
5
6
7
8
9
ezr.h2o_grid_xgb(train_df, valid_df = NULL, xvars = names(train_df),
  yvar = "target", grid_id = "xgb_grid", prescreenxgbm = TRUE,
  novalid_ok = FALSE, prescreen_keepvars_criteria = "number",
  prescreen_keepvars_threshold = 30, xval = TRUE, folds = 5,
  keep_cross_validation_predictions = FALSE, max_models = 1,
  learnrate = c(0.025), max_min_runtime = 15, ntrees = c(125),
  seed = 2018, max_depth = c(3, 5, 7, 9), colsample_bytree = c(1,
  0.5, 0.8), sample_rate = c(1, 0.8, 0.6), gamma = c(0, 1),
  reg_lambda = c(0, 0.5, 0.25), ...)

Arguments

train_df

Training dataframe

valid_df

If not provided, the training dataframe is split for you 80/20

xvars

The xvariables in the model

yvar

The target variable

grid_id

Name of Grid ID

prescreenxgbm

Use a prescreen? This will run an xgb model and then from this a selected number of features will be chosen to run in final model. This is intended to help speed up modeling process and to avoid modeling with obviously worthless data.

novalid_ok

Run a model just the training dataset only.

prescreen_keepvars_criteria

Values are percent or number. Percent picks variables that contribute at least ___ percent, which is set at 0.005 by default. Number picks the top N best variables

prescreen_keepvars_threshold

What is the percentage threshold or integer number to keep if you use a prescreen model?

xval

Cross validation, TRUE/FALSE

folds

# of Folds if you use cross validation

keep_cross_validation_predictions

, Keep the predictions? Defaults to false.

max_models

Defaults to 1

learnrate

Defaults to 0.025

max_min_runtime

Defaults to 15min. Remember to enter this as minutes, not seconds.

ntrees

Defaults to 125

seed

Defaults to 2018

max_depth

Defaults to a grid search of 3,5,7,9

colsample_bytree

Default values

sample_rate

Default values

gamma

Please tune

reg_lambda

This is L2 regularization. L1 is reg_alpha, please pass in under ...

...

Hyper parameters

Details

Hyper parameters should be tuned! The ones preset to search over are available for convience only.

Value

A grid searched models


jmp1989/easyr documentation built on May 20, 2019, 7:25 a.m.