hmda.grid.analysis: Analyze Hyperparameter Grid Performance

View source: R/hmda.grid.analysis.R

hmda.grid.analysisR Documentation

Analyze Hyperparameter Grid Performance

Description

Reorders an HMDA grid based on a specified performance metric and supplements the grid's summary table with additional performance metrics extracted via cross-validation. The function returns a data frame of performance metrics for each model in the grid. This enables a detailed analysis of model performance across various metrics such as logloss, AUC, RMSE, etc.

Usage

hmda.grid.analysis(
  grid,
  performance_metrics = c("logloss", "mse", "rmse", "rmsle", "auc", "aucpr",
    "mean_per_class_error", "r2"),
  sort_by = "logloss"
)

Arguments

grid

A HMDA grid object from which the performance summary will be extracted.

performance_metrics

A character vector of additional performance metric names to be included in the analysis. Default is c("logloss", "mse", "rmse", "rmsle", "auc", "aucpr", "mean_per_class_error", "r2").

sort_by

A character string indicating the performance metric to sort the grid by. Default is "logloss". For metrics such as logloss, mae, mse, rmse, and rmsle, lower values are better, while for metrics like AUC, AUCPR, and R2, higher values are preferred.

Details

The function performs the following steps:

  1. Grid Reordering: It calls h2o.getGrid() to reorder the grid based on the sort_by metric. For metrics like "logloss", "mse", "rmse", and "rmsle", sorting is in ascending order; for others, it is in descending order.

  2. Performance Table Extraction: The grid's summary table is converted into a data frame.

  3. Additional Metric Calculation: For each metric specified in performance_metrics (other than the one used for sorting), the function initializes a column with NA values and iterates over each model in the grid (via its model_ids) to extract the corresponding cross-validated performance metric using functions such as h2o.auc(), h2o.rmse(), etc. For threshold-based metrics (e.g., f1, f2, mcc, kappa), it retrieves performance via h2o.performance().

  4. Return: The function returns the merged data frame of performance metrics.

Value

A data frame of class "hmda.grid.analysis" that contains the merged performance summary table. This table includes the default metrics from the grid summary along with the additional metrics specified by performance_metrics (if available). The data frame is sorted according to the sort_by metric.

Author(s)

E. F. Haghish

Examples

## Not run: 
  # NOTE: This example may take a long time to run on your machine

  # Initialize H2O (if not already running)
  library(HMDA)
  library(h2o)
  hmda.init()

  # Import a sample binary outcome train/test set into H2O
  train <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
  test <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")

  # Identify predictors and response
  y <- "response"
  x <- setdiff(names(train), y)

  # For binary classification, response should be a factor
  train[, y] <- as.factor(train[, y])
  test[, y] <- as.factor(test[, y])

  # Run the hyperparameter search using DRF and GBM algorithms.
  result <- hmda.search.param(algorithm = c("gbm"),
                              x = x,
                              y = y,
                              training_frame = train,
                              max_models = 100,
                              nfolds = 10,
                              stopping_metric = "AUC",
                              stopping_rounds = 3)

  # Assess the performances of the models
  grid_performance <- hmda.grid.analysis(gbm_grid1)

  # Return the best 2 models according to each metric
  hmda.best.models(grid_performance, n_models = 2)

## End(Not run)


HMDA documentation built on April 4, 2025, 6:06 a.m.