View source: R/hmda.grid.analysis.R
hmda.grid.analysis | R Documentation |
Reorders an HMDA grid based on a specified performance metric and supplements the grid's summary table with additional performance metrics extracted via cross-validation. The function returns a data frame of performance metrics for each model in the grid. This enables a detailed analysis of model performance across various metrics such as logloss, AUC, RMSE, etc.
hmda.grid.analysis(
grid,
performance_metrics = c("logloss", "mse", "rmse", "rmsle", "auc", "aucpr",
"mean_per_class_error", "r2"),
sort_by = "logloss"
)
grid |
A HMDA grid object from which the performance summary will be extracted. |
performance_metrics |
A character vector of additional performance metric
names to be included in the analysis. Default is
|
sort_by |
A character string indicating the performance metric to sort the grid
by. Default is |
The function performs the following steps:
Grid Reordering: It calls h2o.getGrid()
to reorder the grid
based on the sort_by
metric. For metrics like "logloss", "mse",
"rmse", and "rmsle", sorting is in ascending order; for others, it is in descending
order.
Performance Table Extraction: The grid's summary table is converted into a data frame.
Additional Metric Calculation: For each metric specified in
performance_metrics
(other than the one used for sorting), the function
initializes a column with NA values and iterates over each model in the grid
(via its model_ids
) to extract the corresponding cross-validated
performance metric using functions such as h2o.auc()
, h2o.rmse()
,
etc. For threshold-based metrics (e.g., f1
, f2
, mcc
,
kappa
), it retrieves performance via h2o.performance()
.
Return: The function returns the merged data frame of performance metrics.
A data frame of class "hmda.grid.analysis"
that contains the merged
performance summary table. This table includes the default metrics from the grid
summary along with the additional metrics specified by performance_metrics
(if available). The data frame is sorted according to the sort_by
metric.
E. F. Haghish
## Not run:
# NOTE: This example may take a long time to run on your machine
# Initialize H2O (if not already running)
library(HMDA)
library(h2o)
hmda.init()
# Import a sample binary outcome train/test set into H2O
train <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
test <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")
# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)
# For binary classification, response should be a factor
train[, y] <- as.factor(train[, y])
test[, y] <- as.factor(test[, y])
# Run the hyperparameter search using DRF and GBM algorithms.
result <- hmda.search.param(algorithm = c("gbm"),
x = x,
y = y,
training_frame = train,
max_models = 100,
nfolds = 10,
stopping_metric = "AUC",
stopping_rounds = 3)
# Assess the performances of the models
grid_performance <- hmda.grid.analysis(gbm_grid1)
# Return the best 2 models according to each metric
hmda.best.models(grid_performance, n_models = 2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.