nano_automl: Building Automated H2O Models
In Nanoputian628/nano: Data Visualisation and Model Selection

nano_automl

R Documentation

Building Automated H2O Models

Description

Creates robust, fast models using H2O's h2o.automl function implemented with nano objects.

Usage

nano_automl(
  nano = nano::create_nano(),
  response,
  data,
  test,
  train_test = NA,
  ignore_vars = c(),
  weight_column = NULL,
  fold_column = NULL,
  nfolds = NA,
  max_models = 3,
  max_time = 10 * 60,
  thresh = 10,
  monotone_constraints = NULL,
  exclude_algos = c("StackedEnsemble", "DeepLearning"),
  include_algos = NULL,
  plots = TRUE,
  alarm = TRUE,
  quiet = FALSE,
  save = FALSE,
  subdir = NA,
  project = "ML Project",
  seed = 628,
  project_name = paste0("grid_", nano$n_model + 1),
  grid_description = "",
  ...
)

Arguments

`nano`	nano object to store model in. If not specified, a new nano object will be created the results will be stored in the new nano object.
`response`	a character. Target variable for model.
`data`	a data.frame containing data to train model. May also contain testing and holdout data, in which case, the `train_test` must be specified.
`test`	a data.frame containing testing dataset. If this is provided, the `train_test`, `fold_column` and `nfolds` arguments cannot be used.
`train_test`	a character. Variable in `data` which contains split for training, testing and holdout datasets (optional). Can only have the values: "training", "test", "holdout".
`ignore_vars`	vector of characters. Variables in the dataset which should not be used for modelling. Note, if any of `train_test`, `weight_column` or `fold_column` arguments are specified, those variables will be automatically included in `ignore_vars`.
`weight_column`	a character. Column name in `data` containing weights if used.
`fold_column`	a character. Column name in `data` containing fold assignments if used. If this is provided, the `test` and `nfolds` arguments cannot be used. The `train_test` argument can be used, however it cannot contain the values "test".
`nfolds`	a numeric. Number of folds used in cross-validation. If this is provided, the `test` and `nfolds` arguments cannot be used. The `train_test` argument can be used, however it cannot contain the values "test".
`max_models`	a numeric. Maximum number of models to be built.
`max_time`	a numeric. Maximum amount of time spent building models.
`thresh`	a numeric. Cutoff of number of unique values in response variable to determine whether performing classification or regression. Default value is 10.
`monotone_constraints`	a list. Mapping between variable names in `data` to values +1 or -1. Use +1 to enforce an increasing constraint while use -1 for a decreasing constraint. Constraints are only valid for numerical columns.
`exclude_algos`	a vector of characters. Algorithms which should be excluding from training process.
`include_algos`	a vector of characters. Algorithms to be included in training process. Set to `NULL` to ignore. If `exclude_algos` and `include_algos` are both provided, only `include_algos` will be used.
`plots`	a logical. Whether to produce plots.
`alarm`	a logical. Whether to beep when function has finished running.
`quiet`	a logical. Whether to print messages to the console.
`seed`	a numeric.
`grid_description`	a character. Optional description of grid. Can be later accessed by `nano$grid[[grid_no]]@meta$description`.
`...`	further parameters to pass to `h2o.grid` depending on `algo`.

Details

This function used H2O's h2o.automl function to easily and quickly build several different machine learning models. Importantly, an active H2O connection is required (i.e. run h2o.init())) before using this function.

For more details, please see the documentation for h2o.automl.

Value

nano object with new entry filled with models produced.

Examples

## Not run: 
if(interactive()){
 library(h2o)
 library(nano)
 
 h2o.init()
 
 # import dataset
 data(property_prices)
 # prepare data for modelling
 data_all <- nano::data_prep(data          = property_prices, 
                             response      = "sale_price",
                             split_or_fold = 0.7,
                             holdout_ratio = 0.1)
 data <- data_all$data
 
 # create models
 nano <- nano_automl(data         = data, 
                     response     = "sale_price", 
                     train_test   = "data_id",
                     ignore_vars = "data_id")
 
 }

## End(Not run)

Nanoputian628/nano documentation built on Oct. 30, 2023, 3:28 p.m.