h2o_train: Model wrappers for h2o
In agua: 'tidymodels' Integration with 'h2o'

h2o_train

R Documentation

Model wrappers for h2o

Description

Basic model wrappers for h2o model functions that include data conversion, seed configuration, and so on.

Usage

h2o_train(
  x,
  y,
  model,
  weights = NULL,
  validation = NULL,
  save_data = FALSE,
  ...
)

h2o_train_rf(x, y, ntrees = 50, mtries = -1, min_rows = 1, ...)

h2o_train_xgboost(
  x,
  y,
  ntrees = 50,
  max_depth = 6,
  min_rows = 1,
  learn_rate = 0.3,
  sample_rate = 1,
  col_sample_rate = 1,
  min_split_improvement = 0,
  stopping_rounds = 0,
  validation = NULL,
  ...
)

h2o_train_gbm(
  x,
  y,
  ntrees = 50,
  max_depth = 6,
  min_rows = 1,
  learn_rate = 0.3,
  sample_rate = 1,
  col_sample_rate = 1,
  min_split_improvement = 0,
  stopping_rounds = 0,
  ...
)

h2o_train_glm(x, y, lambda = NULL, alpha = NULL, ...)

h2o_train_nb(x, y, laplace = 0, ...)

h2o_train_mlp(
  x,
  y,
  hidden = 200,
  l2 = 0,
  hidden_dropout_ratios = 0,
  epochs = 10,
  activation = "Rectifier",
  validation = NULL,
  ...
)

h2o_train_rule(
  x,
  y,
  rule_generation_ntrees = 50,
  max_rule_length = 5,
  lambda = NULL,
  ...
)

h2o_train_auto(x, y, verbosity = NULL, save_data = FALSE, ...)

Arguments

`x`	A data frame of predictors.
`y`	A vector of outcomes.
`model`	A character string for the model. Current selections are `"automl"`, `"randomForest"`, `"xgboost"`, `"gbm"`, `"glm"`, `"deeplearning"`, `"rulefit"` and `"naiveBayes"`. Use `h2o_xgboost_available()` to see if xgboost can be used on your OS/h2o server.
`weights`	A numeric vector of case weights.
`validation`	An integer between 0 and 1 specifying the proportion of the data reserved as validation set. This is used by h2o for performance assessment and potential early stopping. Default to 0.
`save_data`	A logical for whether training data should be saved on the h2o server, set this to `TRUE` for AutoML models that needs to be re-fitted.
`...`	Other options to pass to the h2o model functions (e.g., `h2o::h2o.randomForest()`).
`ntrees`	Number of trees. Defaults to 50.
`mtries`	Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrtp for classification and p/3 for regression (where p is the # of predictors Defaults to -1.
`min_rows`	Fewest allowed (weighted) observations in a leaf. Defaults to 1.
`max_depth`	Maximum tree depth (0 for unlimited). Defaults to 20.
`learn_rate`	(same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3.
`sample_rate`	Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.632.
`col_sample_rate`	(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1.
`min_split_improvement`	Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05.
`stopping_rounds`	Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0.
`lambda`	Regularization strength
`alpha`	Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.
`laplace`	Laplace smoothing parameter Defaults to 0.
`hidden`	Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200).
`l2`	L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0.
`hidden_dropout_ratios`	Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
`epochs`	How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.
`activation`	Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier.
`rule_generation_ntrees`	Specifies the number of trees to build in the tree model. Defaults to 50. Defaults to 50.
`max_rule_length`	Maximum length of rules. Defaults to 3.
`verbosity`	Verbosity of the backend messages printed during training; Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to NULL.

Value

An h2o model object.

Examples


# start with h2o::h2o.init()
if (h2o_running()) {
 # -------------------------------------------------------------------------
 # Using the model wrappers:
 h2o_train_glm(mtcars[, -1], mtcars$mpg)

 # -------------------------------------------------------------------------
 # using parsnip:

 spec <-
   rand_forest(mtry = 3, trees = 500) %>%
   set_engine("h2o") %>%
   set_mode("regression")

 set.seed(1)
 mod <- fit(spec, mpg ~ ., data = mtcars)
 mod

 predict(mod, head(mtcars))
}

agua documentation built on June 7, 2023, 5:07 p.m.