hmda.best.models: Select Best Models Across All Models in HMDA Grid

View source: R/hmda.best.models.R

hmda.best.modelsR Documentation

Select Best Models Across All Models in HMDA Grid

Description

Scans a HMDA grid analysis data frame for H2O performance metric columns and, for each metric, selects the top n_models best-performing models based on the proper optimization direction (i.e., lower values are better for some metrics and higher values are better for others). The function then returns a summary data frame showing the union of these best models (without duplication) along with the corresponding metric values that led to their selection.

Usage

hmda.best.models(df, n_models = 1)

Arguments

df

A data frame of class "hmda.grid.analysis" containing model performance results. It must include a column named model_ids and one or more numeric columns representing H2O performance metrics (e.g., logloss, auc, rmse, etc.).

n_models

Integer. The number of top models to select per metric. Default is 1.

Details

The function uses a predefined set of H2O performance metrics along with their desired optimization directions:

logloss, mae, mse, rmse, rmsle, mean_per_class_error

Lower values are better.

auc, aucpr, r2, accuracy, f1, mcc, f2

Higher values are better.

For each metric in the predefined list that exists in df and is not entirely NA, the function orders the values (using order()) according to whether lower or higher values indicate better performance. It then selects the top n_models model IDs for that metric. The union of these model IDs is used to subset the original data frame. The returned data frame includes the model_ids column and the performance metric columns (from the predefined list) that were found in the input data frame.

Value

A data frame containing the rows corresponding to the union of best model IDs (across all metrics) and the columns for model_ids plus the performance metrics that are present in the data frame.

Author(s)

E. F. Haghish

Examples

## Not run: 
  # Example: Create a hyperparameter grid for GBM models.
  predictors <- c("var1", "var2", "var3")
  response <- "target"

  # Define hyperparameter ranges
  hyper_params <- list(
    ntrees = seq(50, 150, by = 25),
    max_depth = c(5, 10, 15),
    learn_rate = c(0.01, 0.05, 0.1),
    sample_rate = c(0.8, 1.0),
    col_sample_rate = c(0.8, 1.0)
  )

  # Run the grid search
  grid <- hmda.grid(
    algorithm = "gbm",
    x = predictors,
    y = response,
    training_frame = h2o.getFrame("hmda.train.hex"),
    hyper_params = hyper_params,
    nfolds = 10,
    stopping_metric = "AUTO"
  )

  # Assess the performances of the models
  grid_performance <- hmda.grid.analysis(grid)

  # Return the best 2 models according to each metric
  hmda.best.models(grid_performance, n_models = 2)

## End(Not run)


HMDA documentation built on April 4, 2025, 6:06 a.m.