shapley: Weighted Mean SHAP Ratio and Confidence Interval for a ML...
In shapley: Weighted Mean SHAP and CI for Robust Feature Assessment in ML Grid

shapley

R Documentation

Weighted Mean SHAP Ratio and Confidence Interval for a ML Grid of Fine-Tuned Models or Base-Learners of a Stacked Ensemble Model

Description

Calculates weighted mean SHAP ratios and confidence intervals to assess feature importance across a collection of models (e.g., a grid of fine-tuned models or base-learners in a stacked ensemble). Rather than reporting relative SHAP contributions for only a single model, this function accounts for variability in feature importance across multiple models. Each model's performance metric is used as a weight. The function also provides a plot of weighted SHAP values with confidence intervals. Currently, only models trained by the h2o machine learning platform, autoEnsemble, and the HMDA R packages are supported.

Usage

shapley(
  models,
  newdata,
  plot = TRUE,
  performance_metric = "r2",
  standardize_performance_metric = FALSE,
  performance_type = "xval",
  minimum_performance = 0,
  method = "mean",
  cutoff = 0.01,
  top_n_features = NULL,
  n_models = 10,
  sample_size = nrow(newdata)
)

Arguments

`models`	h2o search grid, autoML grid, or a character vector of H2O model IDs.
`newdata`	An `h2o` frame (or `data.frame`) already uploaded to the `h2o` server. This data will be used for computing SHAP contributions for each model, alongside model's performance weights.
`plot`	logical. if TRUE, the weighted mean and confidence intervals of the SHAP values are plotted. The default is TRUE.
`performance_metric`	Character specifying which performance metric to use as weights. The default is `"r2"`, which can be used for both regression and classification. For binary classification, other options include: `"aucpr"` (area under the precision-recall curve), `"auc"` (area under the ROC curve), and `"f2"` (F2 score).
`standardize_performance_metric`	Logical, indicating whether to standardize the performance metric used as weights so their sum equals the number of models. The default is `FALSE`.
`performance_type`	Character. Specify which performance metric should be reported: `"train"` for training data, `"valid"` for validation, or `"xval"` for cross-validation (default).
`minimum_performance`	Numeric. Specify the minimum performance metric for a model to be included in calculating weighted mean SHAP ratio Models below this threshold receive zero weight. The default is `0`.
`method`	Character. Specify the method for selecting important features based on their weighted mean SHAP ratios. The default is `"mean"`, which selects features whose weighted mean shap ratio (WMSHAP) exceeds the `cutoff`. The alternative is `"lowerCI"`, which selects features whose lower bound of confidence interval exceeds the `cutoff`.
`cutoff`	numeric, specifying the cutoff for the method used for selecting the top features.
`top_n_features`	integer. if specified, the top n features with the highest weighted SHAP values will be selected, overrullung the 'cutoff' and 'method' arguments. specifying top_n_feature is also a way to reduce computation time, if many features are present in the data set. The default is NULL, which means the shap values will be computed for all features.
`n_models`	minimum number of models that should meet the 'minimum_performance' criterion in order to compute WMSHAP and CI. If the intention is to compute global summary SHAP values (at feature level) for a single model, set n_models to 1. The default is 10.
`sample_size`	integer. number of rows in the `newdata` that should be used for SHAP assessment. By default, all rows are used, which is the recommended procedure for scientific analyses. However, SHAP analysis is time consuming and in the process of code development, lower values can be used for quicker shapley analyses.

Details

The function works as follows:

SHAP contributions are computed at the individual level (row) for each model for the given "newdata".
Each model's feature-level SHAP ratios (i.e., share of total SHAP) are computed.
The performance metrics of the models are used as weights.
Using the weights vector and shap ratio of features for each model, the weighted mean SHAP ratios and their confidence intervals are computed.

Value

a list including the GGPLOT2 object, the data frame of SHAP values, and performance metric of all models, as well as the model IDs.

Author(s)

E. F. Haghish

Examples


## Not run: 
# load the required libraries for building the base-learners and the ensemble models
library(h2o)            #shapley supports h2o models
library(shapley)

# initiate the h2o server
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE)

# upload data to h2o cloud
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)

set.seed(10)

### H2O provides 2 types of grid search for tuning the models, which are
### AutoML and Grid. Below, I demonstrate how weighted mean shapley values
### can be computed for both types.

#######################################################
### PREPARE AutoML Grid (takes a couple of minutes)
#######################################################
# run AutoML to tune various models (GBM) for 60 seconds
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y])  #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 120,
                 include_algos=c("GBM"),

                 # this setting ensures the models are comparable for building a meta learner
                 seed = 2023, nfolds = 10,
                 keep_cross_validation_predictions = TRUE)

### call 'shapley' function to compute the weighted mean and weighted confidence intervals
### of SHAP values across all trained models.
### Note that the 'newdata' should be the testing dataset!
result <- shapley(models = aml, newdata = prostate, performance_metric = "aucpr", plot = TRUE)

#######################################################
### PREPARE H2O Grid (takes a couple of minutes)
#######################################################
# make sure equal number of "nfolds" is specified for different grids
grid <- h2o.grid(algorithm = "gbm", y = y, training_frame = prostate,
                 hyper_params = list(ntrees = seq(1,50,1)),
                 grid_id = "ensemble_grid",

                 # this setting ensures the models are comparable for building a meta learner
                 seed = 2023, fold_assignment = "Modulo", nfolds = 10,
                 keep_cross_validation_predictions = TRUE)

result2 <- shapley(models = grid, newdata = prostate, performance_metric = "aucpr", plot = TRUE)

#######################################################
### PREPARE autoEnsemble STACKED ENSEMBLE MODEL
#######################################################

### get the models' IDs from the AutoML and grid searches.
### this is all that is needed before building the ensemble,
### i.e., to specify the model IDs that should be evaluated.
library(autoEnsemble)
ids    <- c(h2o.get_ids(aml), h2o.get_ids(grid))
autoSearch <- ensemble(models = ids, training_frame = prostate, strategy = "search")
result3 <- shapley(models = autoSearch, newdata = prostate,
                   performance_metric = "aucpr", plot = TRUE)



## End(Not run)

shapley documentation built on April 12, 2025, 2:16 a.m.