ezr.h2o_glm_grid: GLM Grid

Description Usage Arguments Details Value

Description

Run GLM grid search against alphas and lambdas. Additionally, there is the option to pre-screen with an xgboost model to narrow down importance features.

Usage

1
2
3
4
5
6
7
ezr.h2o_glm_grid(train_df, valid_df = NULL, xvars = names(train_df),
  no_regularization = FALSE, family = "binomial", yvar = "target",
  grid_id = "glm_grid", use_prescreen = TRUE,
  prescreen_keepvars_criteria = "number",
  prescreen_keepvars_threshold = 30, xval = TRUE, folds = 5,
  keep_cross_validation_predictions = FALSE, max_models = 4,
  max_min_runtime = 15, ...)

Arguments

train_df

Training dataframe

valid_df

If not provided, the training dataframe is split for you 80/20

xvars

The xvariables in the model

no_regularization

Default is FALSE. Set to TRUE, if you want to run plain logistic / linear regression.

family

. Default is 'binomial'. Use binomial for classification with logistic regression, others are for regression problems. Must be one of: "gaussian", "binomial", "quasibinomial", "ordinal", "multinomial", "poisson", "gamma", "tweedie".

yvar

The target variable

grid_id

Name of Grid ID

use_prescreen

Default to TRUE

prescreen_keepvars_criteria

Default is number, 30. Valid values here are 'percent' or 'number'. Number picks a certain number of variables, percentage picks a percentage from xgboost model.

prescreen_keepvars_threshold

Default is 30. Change this to percentage if you want percentages

xval

Cross validation, TRUE/FALSE

folds

# of Folds if you use cross validation

keep_cross_validation_predictions

, Keep the predictions? Defaults to false.

max_models

Defaults to 1

max_min_runtime

Defaults to 15min. Remember to enter this as minutes, not seconds.

...

Additional model parameters.

seed

Defaults to 2018

Details

Models can be run with Xval, and use different GLM types.

Additioinally, there is an easier interface for running regular linear/logistic regressions without any regularization.

Extra parameters can be added with ... in particular, the max_active_predictors, should be added to help obtain sparse solutions, especially when you don't want model to run forever.

Note that it is ideal to setup the train and valid dataframes before passing in the data so you are certain of what they are. Note, that for non-regularization models,

Value

A grid searched models


jmp1989/easyr documentation built on May 20, 2019, 7:25 a.m.