shapley.row.plot: WMSHAP row-level plot for a single observation (participant...

View source: R/shapley.row.plot.R

shapley.row.plotR Documentation

WMSHAP row-level plot for a single observation (participant or data row)

Description

Computes and visualizes Weighted Mean SHAP contributions (WMSHAP) for a single row (subject/observation) across multiple models in a shapley object. For each feature, the function computes a weighted mean of row-level SHAP contributions across models using shapley$weights and reports an approximate 95 interval summarizing variability across models.

Usage

shapley.row.plot(
  shapley,
  row_index,
  top_n_features = NULL,
  features = NULL,
  nonzeroCI = FALSE,
  plot = TRUE,
  print = FALSE
)

Arguments

shapley

object of class "shapley", as returned by the 'shapley' function

row_index

Integer (length 1). The row/subject identifier to visualize. This is matched against the index column in shapley$results.

top_n_features

Integer. If specified, the top n features with the highest weighted SHAP values will be selected. This will be overrulled by the 'features' argument.

features

Optional character vector of feature names to plot. If NULL, all available features in shapley$results are used. Specifying the features argument will override the top_n_features argument.

nonzeroCI

Logical. If TRUE, it avoids ploting features that have a confidence interval crossing zero.

plot

Logical. If TRUE, prints the plot.

print

Logical. If TRUE, prints the computed summary table for the row.

Value

a list including the GGPLOT2 object and the data frame of WMSHAP summary values.

Author(s)

E. F. Haghish

Examples


## Not run: 
# load the required libraries for building the base-learners and the ensemble models
library(h2o)            #shapley supports h2o models
library(shapley)

# initiate the h2o server
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE,
         insecure = TRUE)

# upload data to h2o cloud
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)

set.seed(10)

### H2O provides 2 types of grid search for tuning the models, which are
### AutoML and Grid. Below, I demonstrate how weighted mean shapley values
### can be computed for both types.

#######################################################
### EXAMPLE 1: PREPARE AutoML Grid (takes a couple of minutes)
#######################################################
# run AutoML to tune various models (GBM) for 60 seconds
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y])  #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 120,
                 include_algos=c("GBM"),

                 seed = 2023, nfolds = 10,
                 keep_cross_validation_predictions = TRUE)

### call 'shapley' function to compute the weighted mean and weighted confidence intervals
### of SHAP values across all trained models.
### Note that the 'newdata' should be the testing dataset!
result <- shapley(models = aml, newdata = prostate,
                  performance_metric = "aucpr", plot = TRUE)

shapley.row.plot(result, row_index = 11)

#######################################################
### EXAMPLE 2: PREPARE H2O Grid (takes a couple of minutes)
#######################################################
# make sure equal number of "nfolds" is specified for different grids
grid <- h2o.grid(algorithm = "gbm", y = y, training_frame = prostate,
                 hyper_params = list(ntrees = seq(1,50,1)),
                 grid_id = "ensemble_grid",

                 # this setting ensures the models are comparable for building a meta learner
                 seed = 2023, fold_assignment = "Modulo", nfolds = 10,
                 keep_cross_validation_predictions = TRUE)

result2 <- shapley(models = grid, newdata = prostate,
                   performance_metric = "aucpr", plot = TRUE)

shapley.row.plot(result2, row_index = 9)
shapley.row.plot(result2, row_index = 9, nonzeroCI = TRUE)
shapley.row.plot(result2, row_index = 9, top_n_features = 10)

#######################################################
### EXAMPLE 3: PREPARE autoEnsemble STACKED ENSEMBLE MODEL
#######################################################

### get the models' IDs from the AutoML and grid searches.
### this is all that is needed before building the ensemble,
### i.e., to specify the model IDs that should be evaluated.
library(autoEnsemble)
ids    <- c(h2o.get_ids(aml), h2o.get_ids(grid))
autoSearch <- ensemble(models = ids, training_frame = prostate, strategy = "search")
result3 <- shapley(models = autoSearch, newdata = prostate,
                   performance_metric = "aucpr", plot = TRUE)

#plot all important features
shapley.row.plot(result3, row_index = 13)

#plot only the given features
shapPlot <- shapley.row.plot(result3, row_index = 13, features = c("PSA", "AGE"))

# inspect the computed data for the row 13
ptint(shapPlot$summary)

## End(Not run)

shapley documentation built on March 4, 2026, 9:06 a.m.