View source: R/hmda.best.models.R
hmda.best.models | R Documentation |
Scans a HMDA grid analysis data frame for H2O performance
metric columns and, for each metric, selects the top n_models
best-performing models based on the proper optimization direction
(i.e., lower values are better for some metrics and higher values
are better for others). The function then returns a summary data frame
showing the union of these best models (without duplication) along with
the corresponding metric values that led to their selection.
hmda.best.models(df, n_models = 1)
df |
A data frame of class |
n_models |
Integer. The number of top models to select per metric. Default is 1. |
The function uses a predefined set of H2O performance metrics along with their desired optimization directions:
Lower values are better.
Higher values are better.
For each metric in the predefined list that exists in df
and is not
entirely NA, the function orders the values (using order()
) according
to whether lower or higher values indicate better performance. It then selects
the top n_models
model IDs for that metric. The union of these model IDs
is used to subset the original data frame. The returned data frame includes
the model_ids
column and the performance metric columns (from the
predefined list) that were found in the input data frame.
A data frame containing the rows corresponding to the union of
best model IDs (across all metrics) and the columns for
model_ids
plus the performance metrics that are present
in the data frame.
E. F. Haghish
## Not run:
# Example: Create a hyperparameter grid for GBM models.
predictors <- c("var1", "var2", "var3")
response <- "target"
# Define hyperparameter ranges
hyper_params <- list(
ntrees = seq(50, 150, by = 25),
max_depth = c(5, 10, 15),
learn_rate = c(0.01, 0.05, 0.1),
sample_rate = c(0.8, 1.0),
col_sample_rate = c(0.8, 1.0)
)
# Run the grid search
grid <- hmda.grid(
algorithm = "gbm",
x = predictors,
y = response,
training_frame = h2o.getFrame("hmda.train.hex"),
hyper_params = hyper_params,
nfolds = 10,
stopping_metric = "AUTO"
)
# Assess the performances of the models
grid_performance <- hmda.grid.analysis(grid)
# Return the best 2 models according to each metric
hmda.best.models(grid_performance, n_models = 2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.