View source: R/hmda.grid.analysis.R
| hmda.grid.analysis | R Documentation | 
Reorders an HMDA grid based on a specified performance metric and supplements the grid's summary table with additional performance metrics extracted via cross-validation. The function returns a data frame of performance metrics for each model in the grid. This enables a detailed analysis of model performance across various metrics such as logloss, AUC, RMSE, etc.
hmda.grid.analysis(
  grid,
  performance_metrics = c("logloss", "mse", "rmse", "rmsle", "auc", "aucpr",
    "mean_per_class_error", "r2"),
  sort_by = "logloss"
)
| grid | A HMDA grid object from which the performance summary will be extracted. | 
| performance_metrics | A character vector of additional performance metric
names to be included in the analysis. Default is
 | 
| sort_by | A character string indicating the performance metric to sort the grid
by. Default is  | 
The function performs the following steps:
Grid Reordering: It calls h2o.getGrid() to reorder the grid
based on the sort_by metric. For metrics like "logloss", "mse",
"rmse", and "rmsle", sorting is in ascending order; for others, it is in descending
order.
Performance Table Extraction: The grid's summary table is converted into a data frame.
Additional Metric Calculation: For each metric specified in
performance_metrics (other than the one used for sorting), the function
initializes a column with NA values and iterates over each model in the grid
(via its model_ids) to extract the corresponding cross-validated
performance metric using functions such as h2o.auc(), h2o.rmse(),
etc. For threshold-based metrics (e.g., f1, f2, mcc,
kappa), it retrieves performance via h2o.performance().
Return: The function returns the merged data frame of performance metrics.
A data frame of class "hmda.grid.analysis" that contains the merged
performance summary table. This table includes the default metrics from the grid
summary along with the additional metrics specified by performance_metrics
(if available). The data frame is sorted according to the sort_by metric.
E. F. Haghish
## Not run: 
  # NOTE: This example may take a long time to run on your machine
  # Initialize H2O (if not already running)
  library(HMDA)
  library(h2o)
  hmda.init()
  # Import a sample binary outcome train/test set into H2O
  train <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
  test <- h2o.importFile(
  "https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")
  # Identify predictors and response
  y <- "response"
  x <- setdiff(names(train), y)
  # For binary classification, response should be a factor
  train[, y] <- as.factor(train[, y])
  test[, y] <- as.factor(test[, y])
  # Run the hyperparameter search using DRF and GBM algorithms.
  result <- hmda.search.param(algorithm = c("gbm"),
                              x = x,
                              y = y,
                              training_frame = train,
                              max_models = 100,
                              nfolds = 10,
                              stopping_metric = "AUC",
                              stopping_rounds = 3)
  # Assess the performances of the models
  grid_performance <- hmda.grid.analysis(gbm_grid1)
  # Return the best 2 models according to each metric
  hmda.best.models(grid_performance, n_models = 2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.