ezr.h2o_glm_grid: GLM Grid
In jmp1989/easyr: Helpful wrappers for common EDA, Data Manipulation, & Modeling

Description Usage Arguments Details Value

Run GLM grid search against alphas and lambdas. Additionally, there is the option to pre-screen with an xgboost model to narrow down importance features.

ezr.h2o_glm_grid(train_df, valid_df = NULL, xvars = names(train_df),
  no_regularization = FALSE, family = "binomial", yvar = "target",
  grid_id = "glm_grid", use_prescreen = TRUE,
  prescreen_keepvars_criteria = "number",
  prescreen_keepvars_threshold = 30, xval = TRUE, folds = 5,
  keep_cross_validation_predictions = FALSE, max_models = 4,
  max_min_runtime = 15, ...)

`train_df`	Training dataframe
`valid_df`	If not provided, the training dataframe is split for you 80/20
`xvars`	The xvariables in the model
`no_regularization`	Default is FALSE. Set to TRUE, if you want to run plain logistic / linear regression.
`family`	. Default is 'binomial'. Use binomial for classification with logistic regression, others are for regression problems. Must be one of: "gaussian", "binomial", "quasibinomial", "ordinal", "multinomial", "poisson", "gamma", "tweedie".
`yvar`	The target variable
`grid_id`	Name of Grid ID
`use_prescreen`	Default to TRUE
`prescreen_keepvars_criteria`	Default is number, 30. Valid values here are 'percent' or 'number'. Number picks a certain number of variables, percentage picks a percentage from xgboost model.
`prescreen_keepvars_threshold`	Default is 30. Change this to percentage if you want percentages
`xval`	Cross validation, TRUE/FALSE
`folds`	# of Folds if you use cross validation
`keep_cross_validation_predictions`	, Keep the predictions? Defaults to false.
`max_models`	Defaults to 1
`max_min_runtime`	Defaults to 15min. Remember to enter this as minutes, not seconds.
`...`	Additional model parameters.
`seed`	Defaults to 2018

Models can be run with Xval, and use different GLM types.

Additioinally, there is an easier interface for running regular linear/logistic regressions without any regularization.

Extra parameters can be added with ... in particular, the max_active_predictors, should be added to help obtain sparse solutions, especially when you don't want model to run forever.

Note that it is ideal to setup the train and valid dataframes before passing in the data so you are certain of what they are. Note, that for non-regularization models,

A grid searched models

jmp1989/easyr documentation built on May 20, 2019, 7:25 a.m.