View source: R/hmda.autoEnsemble.R
hmda.autoEnsemble | R Documentation |
This function is a wrapper within the HMDA package that builds a stacked ensemble model by combining multiple H2O models. It leverages the autoEnsemble package to stack a set of trained models (e.g., from HMDA grid) into a stronger meta-learner. For more details on autoEnsemble, please see the GitHub repository at https://github.com/haghish/autoEnsemble and the CRAN package of autoEnsemble R package.
hmda.autoEnsemble(
models,
training_frame,
newdata = NULL,
family = "binary",
strategy = c("search"),
model_selection_criteria = c("auc", "aucpr", "mcc", "f2"),
min_improvement = 1e-05,
max = NULL,
top_rank = seq(0.01, 0.99, 0.01),
stop_rounds = 3,
reset_stop_rounds = TRUE,
stop_metric = "auc",
seed = -1,
verbatim = FALSE
)
models |
A grid object, such as HMDA grid, or a character vector of H2O model IDs.
The |
training_frame |
An H2OFrame (or data frame already uploaded to the H2O server) that contains the training data used to build the base models. |
newdata |
An H2OFrame (or data frame already uploaded to the H2O server) to be used for evaluating the ensemble. If not specified, performance on the training data is used (for instance, cross-validation performance). |
family |
A character string specifying the model family. |
strategy |
A character vector specifying the ensemble strategy. The available
strategy is |
model_selection_criteria |
A character vector specifying the performance metrics
to consider for model selection. The default is |
min_improvement |
Numeric. The minimum improvement in the evaluation metric required to continue the ensemble search. |
max |
Integer. The maximum number of models for each selection criterion.
If |
top_rank |
Numeric vector. Specifies the percentage (or percentages) of the
top models that should be considered for ensemble selection. If the strategy is
|
stop_rounds |
Integer. The number of consecutive rounds with no improvement in the performance metric before stopping the search. |
reset_stop_rounds |
Logical. If |
stop_metric |
Character. The metric used for early stopping; the default is
|
seed |
Integer. A random seed for reproducibility. Default is |
verbatim |
Logical. If |
This wrapper function integrates with the HMDA package workflow to build a
stacked ensemble model from a set of base H2O models. It calls the
ensemble()
function from the autoEnsemble package to construct the
ensemble. The function is designed to work within HMDA's framework, where base
models are generated via grid search or AutoML. For more details on the autoEnsemble
approach, see:
The ensemble strategy "search"
(default) searches for the best combination
of top-performing and diverse models to improve overall performance. The wrapper
returns both the final ensemble model and the list of top-ranked models used in the
ensemble.
A list containing:
The ensemble model built by autoEnsemble.
A data frame of the top-ranked base models that were used in building the ensemble.
E. F. Haghish
## Not run:
library(HMDA)
library(h2o)
hmda.init()
# Import a sample binary outcome dataset into H2O
train <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
test <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")
# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)
# For binary classification, response should be a factor
train[, y] <- as.factor(train[, y])
test[, y] <- as.factor(test[, y])
params <- list(learn_rate = c(0.01, 0.1),
max_depth = c(3, 5, 9),
sample_rate = c(0.8, 1.0)
)
# Train and validate a cartesian grid of GBMs
hmda_grid1 <- hmda.grid(algorithm = "gbm", x = x, y = y,
grid_id = "hmda_grid1",
training_frame = train,
nfolds = 10,
ntrees = 100,
seed = 1,
hyper_params = gbm_params1)
# Assess the performances of the models
grid_performance <- hmda.grid.analysis(hmda_grid1)
# Return the best 2 models according to each metric
hmda.best.models(grid_performance, n_models = 2)
# build an autoEnsemble model & test it with the testing dataset
meta <- hmda.autoEnsemble(models = hmda_grid1, training_frame = train)
print(h2o.performance(model = meta$model, newdata = test))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.