h2o_train | R Documentation |
Basic model wrappers for h2o model functions that include data conversion, seed configuration, and so on.
h2o_train(
x,
y,
model,
weights = NULL,
validation = NULL,
save_data = FALSE,
...
)
h2o_train_rf(x, y, ntrees = 50, mtries = -1, min_rows = 1, ...)
h2o_train_xgboost(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
validation = NULL,
...
)
h2o_train_gbm(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
...
)
h2o_train_glm(x, y, lambda = NULL, alpha = NULL, ...)
h2o_train_nb(x, y, laplace = 0, ...)
h2o_train_mlp(
x,
y,
hidden = 200,
l2 = 0,
hidden_dropout_ratios = 0,
epochs = 10,
activation = "Rectifier",
validation = NULL,
...
)
h2o_train_rule(
x,
y,
rule_generation_ntrees = 50,
max_rule_length = 5,
lambda = NULL,
...
)
h2o_train_auto(x, y, verbosity = NULL, save_data = FALSE, ...)
x |
A data frame of predictors. |
y |
A vector of outcomes. |
model |
A character string for the model. Current selections are
|
weights |
A numeric vector of case weights. |
validation |
An integer between 0 and 1 specifying the proportion of the data reserved as validation set. This is used by h2o for performance assessment and potential early stopping. Default to 0. |
save_data |
A logical for whether training data should be saved on
the h2o server, set this to |
... |
Other options to pass to the h2o model functions (e.g.,
|
ntrees |
Number of trees. Defaults to 50. |
mtries |
Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrtp for classification and p/3 for regression (where p is the # of predictors Defaults to -1. |
min_rows |
Fewest allowed (weighted) observations in a leaf. Defaults to 1. |
max_depth |
Maximum tree depth (0 for unlimited). Defaults to 20. |
learn_rate |
(same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3. |
sample_rate |
Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.632. |
col_sample_rate |
(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1. |
min_split_improvement |
Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05. |
stopping_rounds |
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0. |
lambda |
Regularization strength |
alpha |
Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise. |
laplace |
Laplace smoothing parameter Defaults to 0. |
hidden |
Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200). |
l2 |
L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0. |
hidden_dropout_ratios |
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5. |
epochs |
How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10. |
activation |
Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier. |
rule_generation_ntrees |
Specifies the number of trees to build in the tree model. Defaults to 50. Defaults to 50. |
max_rule_length |
Maximum length of rules. Defaults to 3. |
verbosity |
Verbosity of the backend messages printed during training; Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to NULL. |
An h2o model object.
# start with h2o::h2o.init()
if (h2o_running()) {
# -------------------------------------------------------------------------
# Using the model wrappers:
h2o_train_glm(mtcars[, -1], mtcars$mpg)
# -------------------------------------------------------------------------
# using parsnip:
spec <-
rand_forest(mtry = 3, trees = 500) %>%
set_engine("h2o") %>%
set_mode("regression")
set.seed(1)
mod <- fit(spec, mpg ~ ., data = mtcars)
mod
predict(mod, head(mtcars))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.