shapley: Weighted Mean SHAP (WMSHAP) and Confidence Interval for...

View source: R/shapley.R

shapleyR Documentation

Weighted Mean SHAP (WMSHAP) and Confidence Interval for Multiple Models (tuning grid, stacked ensemble, etc.)

Description

Computes Weighted Mean SHAP ratios (WMSHAP) and confidence intervals to assess feature importance across a collection of models (e.g., an H2O grid/AutoML leaderboard or base-learners of an ensemble). Instead of reporting SHAP contributions for a single model, this function summarizes feature importance across multiple models and weights each model by a chosen performance metric. Currently, only models trained by the h2o machine learning platform, autoEnsemble, and the HMDA R packages are supported.

Usage

shapley(
  models,
  newdata,
  plot = TRUE,
  performance_metric = "r2",
  standardize_performance_metric = FALSE,
  performance_type = "xval",
  minimum_performance = 0,
  method = "mean",
  cutoff = 0.01,
  top_n_features = NULL,
  n_models = 10,
  sample_size = NULL
)

Arguments

models

An H2O AutoML object, H2O grid object, autoEnsemble object, or a character vector of H2O model IDs.

newdata

An H2OFrame (i.e., a data.frame) already uploaded to the h2o server. SHAP contributions are computed on this data.

plot

Logical. If TRUE, plots the WMSHAP summary (via shapley.plot()).

performance_metric

Character. Performance metric used to weight models. Options are "r2" (regression), "aucpr", "auc", and "f2" (classification metrics).

standardize_performance_metric

Logical. If TRUE, rescales model weights so the weights sum to the number of included models. The default is FALSE.

performance_type

Character. Specify which performance metric performance estimate to use: "train" for training data, "valid" for validation, or "xval" for cross-validation (default).

minimum_performance

Numeric. Specify the minimum performance metric for a model to be included in calculating WMSHAP. Models below this threshold receive zero weight and are excluded. The default is 0. Specifying a minimum performance can be used to compute WMSHAP for a set of competitive models.

method

Character. Specify the method for selecting important features based on their WMSHAP. The default is "mean", which selects features whose WMSHAP exceeds the cutoff. The alternative is "lowerCI", which selects features whose lower bound of confidence interval exceeds the cutoff.

cutoff

Numeric. Cutoff applied by method for selecting important features.

top_n_features

Integer or NULL. If not NULL, restricts SHAP computation to the top N features per model (reduces runtime). This also selects the top N features by WMSHAP in the returned selectedFeatures.

n_models

Integer. Minimum number of models that must meet the performance threshold for WMSHAP and CI computation. Use 1 to compute summary SHAP for a single model. The default is 10.

sample_size

Integer. Number of rows in newdata used for SHAP assessment. Defaults to all rows. Reducing this can speed up development runs.

Details

The function works as follows:

  1. For each model, SHAP contributions are computed on newdata.

  2. For each model, feature-level absolute SHAP contributions are aggregated and converted to a ratio (share of total absolute SHAP across features).

  3. Models are weighted by a performance metric (e.g., "r2" for regression or "auc" / "aucpr" for classification).

  4. The weighted mean SHAP ratio (WMSHAP) is computed for each feature, along with an confidence interval across models.

Value

An object of class "shapley" (a named list) containing:

ids

Character vector of model IDs originally supplied or extracted.

included_models

Character vector of model IDs included after filtering by performance.

ignored_models

Data frame of excluded models and their performance.

weights

Numeric vector of model weights (performance metrics) for included models.

results

Data frame of row-level SHAP contributions merged across models.

summaryShaps

Data frame of feature-level WMSHAP means and confidence intervals.

selectedFeatures

Character vector of selected important features.

feature_importance

List of per-feature absolute contribution summaries by model.

contributionPlot

A ggplot-like object returned by h2o.shap_summary_plot() used for the WMSHAP (“wmshap”) style plot.

plot

A ggplot object (bar plot) if plot = TRUE, otherwise NULL.

Author(s)

E. F. Haghish

Examples


## Not run: 
# load the required libraries for building the base-learners and the ensemble models
library(h2o)            #shapley supports h2o models
library(shapley)

# initiate the h2o server
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE)

# upload data to h2o cloud
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)

set.seed(10)

### H2O provides 2 types of grid search for tuning the models, which are
### AutoML and Grid. Below, I demonstrate how weighted mean shapley values
### can be computed for both types.

#######################################################
### PREPARE AutoML Grid (takes a couple of minutes)
#######################################################
# run AutoML to tune various models (GBM) for 60 seconds
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y])  #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 120,
                 include_algos=c("GBM"),

                 # this setting ensures the models are comparable for building a meta learner
                 seed = 2023, nfolds = 10,
                 keep_cross_validation_predictions = TRUE)

### call 'shapley' function to compute the weighted mean and weighted confidence intervals
### of SHAP values across all trained models.
### Note that the 'newdata' should be the testing dataset!
result <- shapley(models = aml, newdata = prostate, performance_metric = "aucpr", plot = TRUE)

#######################################################
### PREPARE H2O Grid (takes a couple of minutes)
#######################################################
# make sure equal number of "nfolds" is specified for different grids
grid <- h2o.grid(algorithm = "gbm", y = y, training_frame = prostate,
                 hyper_params = list(ntrees = seq(1,50,1)),
                 grid_id = "ensemble_grid",

                 # this setting ensures the models are comparable for building a meta learner
                 seed = 2023, fold_assignment = "Modulo", nfolds = 10,
                 keep_cross_validation_predictions = TRUE)

result2 <- shapley(models = grid, newdata = prostate, performance_metric = "aucpr", plot = TRUE)

#######################################################
### PREPARE autoEnsemble STACKED ENSEMBLE MODEL
#######################################################

### get the models' IDs from the AutoML and grid searches.
### this is all that is needed before building the ensemble,
### i.e., to specify the model IDs that should be evaluated.
library(autoEnsemble)
ids    <- c(h2o.get_ids(aml), h2o.get_ids(grid))
autoSearch <- ensemble(models = ids, training_frame = prostate, strategy = "search")
result3 <- shapley(models = autoSearch, newdata = prostate,
                   performance_metric = "aucpr", plot = TRUE)



## End(Not run)

shapley documentation built on March 4, 2026, 9:06 a.m.