SL.h2o_auto: Automatic machine learning using h2o

View source: R/sl_h2o_auto.R

SL.h2o_autoR Documentation

Automatic machine learning using h2o

Description

Requires a recent version of h2o that has h2o.automl()

Usage

SL.h2o_auto(
  Y,
  X,
  newX,
  family,
  obsWeights,
  id,
  nthreads = 1,
  max_runtime_secs = NULL,
  max_models = 20,
  stopping_metric = NULL,
  stopping_rounds = 7,
  nfolds = 10,
  verbose = T,
  ...
)

Arguments

Y

Outcome variable

X

Covariate dataframe

newX

Optional dataframe to predict the outcome

family

"gaussian" for regression, "binomial" for binary classification, "multinomial" for multiple classification (not yet supported).

obsWeights

Optional observation-level weights (supported but not tested)

id

Optional id to group observations from the same unit (not used currently).

nthreads

Number of threads to use, if h2o cluster not alreay started.

max_runtime_secs

Maximum runtime in seconds, does not yield reproducible results.

max_models

Maximum number of models to fit, key parameter to improve performance.

stopping_metric

Metric to optimize towards.

stopping_rounds

Stop if metric does not improve for this many consecutive rounds.

nfolds

# of CV folds for internal cross-validation.

verbose

If TRUE display extra output.

...

Any remaining arguments, not used.

Examples

# Enable data.table h2o import, which should be faster.
# Make sure data.table and slam R packages are installed too.
options("h2o.use.data.table" = TRUE)
## Not run: 
library(h2o)
# Start an h2o server with all (physical) cores usable.
local_h2o = h2o.init(nthreads = RhpcBLASctl::get_num_cores(),
                     # May need to specify extra memory.
                      max_mem_size = "8g")

library(SuperLearner)
h2o_auto = create.Learner("SL.h2o_auto",
        # Increase max models and stopping rounds for better models.
        # Decrease nfolds for faster training but less certainty.
                          params = list(max_models = 30,
                                        stopping_rounds = 5,
                                        nfolds = 10))
sl =
  SuperLearner(Y = Y, X = X,
               family = binomial(),
               SL.library = c("SL.mean", h2o_auto$names),
               verbose = T,
               # Stratify during CV in case of rare outcome.
               cvControl = SuperLearner.CV.control(V = 10L, stratifyCV = T))
print(sl)

h2o.shutdown()

## End(Not run)

ck37/ckTools documentation built on April 29, 2023, 11:47 p.m.