nano_automl: Building Automated H2O Models

nano_automlR Documentation

Building Automated H2O Models

Description

Creates robust, fast models using H2O's h2o.automl function implemented with nano objects.

Usage

nano_automl(
  nano = nano::create_nano(),
  response,
  data,
  test,
  train_test = NA,
  ignore_vars = c(),
  weight_column = NULL,
  fold_column = NULL,
  nfolds = NA,
  max_models = 3,
  max_time = 10 * 60,
  thresh = 10,
  monotone_constraints = NULL,
  exclude_algos = c("StackedEnsemble", "DeepLearning"),
  include_algos = NULL,
  plots = TRUE,
  alarm = TRUE,
  quiet = FALSE,
  save = FALSE,
  subdir = NA,
  project = "ML Project",
  seed = 628,
  project_name = paste0("grid_", nano$n_model + 1),
  grid_description = "",
  ...
)

Arguments

nano

nano object to store model in. If not specified, a new nano object will be created the results will be stored in the new nano object.

response

a character. Target variable for model.

data

a data.frame containing data to train model. May also contain testing and holdout data, in which case, the train_test must be specified.

test

a data.frame containing testing dataset. If this is provided, the train_test, fold_column and nfolds arguments cannot be used.

train_test

a character. Variable in data which contains split for training, testing and holdout datasets (optional). Can only have the values: "training", "test", "holdout".

ignore_vars

vector of characters. Variables in the dataset which should not be used for modelling. Note, if any of train_test, weight_column or fold_column arguments are specified, those variables will be automatically included in ignore_vars.

weight_column

a character. Column name in data containing weights if used.

fold_column

a character. Column name in data containing fold assignments if used. If this is provided, the test and nfolds arguments cannot be used. The train_test argument can be used, however it cannot contain the values "test".

nfolds

a numeric. Number of folds used in cross-validation. If this is provided, the test and nfolds arguments cannot be used. The train_test argument can be used, however it cannot contain the values "test".

max_models

a numeric. Maximum number of models to be built.

max_time

a numeric. Maximum amount of time spent building models.

thresh

a numeric. Cutoff of number of unique values in response variable to determine whether performing classification or regression. Default value is 10.

monotone_constraints

a list. Mapping between variable names in data to values +1 or -1. Use +1 to enforce an increasing constraint while use -1 for a decreasing constraint. Constraints are only valid for numerical columns.

exclude_algos

a vector of characters. Algorithms which should be excluding from training process.

include_algos

a vector of characters. Algorithms to be included in training process. Set to NULL to ignore. If exclude_algos and include_algos are both provided, only include_algos will be used.

plots

a logical. Whether to produce plots.

alarm

a logical. Whether to beep when function has finished running.

quiet

a logical. Whether to print messages to the console.

seed

a numeric.

grid_description

a character. Optional description of grid. Can be later accessed by nano$grid[[grid_no]]@meta$description.

...

further parameters to pass to h2o.grid depending on algo.

Details

This function used H2O's h2o.automl function to easily and quickly build several different machine learning models. Importantly, an active H2O connection is required (i.e. run h2o.init())) before using this function.

For more details, please see the documentation for h2o.automl.

Value

nano object with new entry filled with models produced.

Examples

## Not run: 
if(interactive()){
 library(h2o)
 library(nano)
 
 h2o.init()
 
 # import dataset
 data(property_prices)
 # prepare data for modelling
 data_all <- nano::data_prep(data          = property_prices, 
                             response      = "sale_price",
                             split_or_fold = 0.7,
                             holdout_ratio = 0.1)
 data <- data_all$data
 
 # create models
 nano <- nano_automl(data         = data, 
                     response     = "sale_price", 
                     train_test   = "data_id",
                     ignore_vars = "data_id")
 
 }

## End(Not run)

Nanoputian628/nano documentation built on Oct. 30, 2023, 3:28 p.m.