shapley.domain: compute and plot weighted mean SHAP contributions at group...

View source: R/shapley.domain.R

shapley.domainR Documentation

compute and plot weighted mean SHAP contributions at group level (factors or domains)

Description

This function applies different criteria to visualize SHAP contributions

Usage

shapley.domain(
  shapley,
  domains,
  plot = "bar",
  legendstyle = "continuous",
  scale_colour_gradient = NULL,
  print = FALSE
)

Arguments

shapley

object of class 'shapley', as returned by the 'shapley' function

domains

character list, specifying the domains for grouping the features' contributions. Domains are clusters of features' names, that can be used to compute WMSHAP at higher level, along with their 95 better understand how a cluster of features influence the outcome. Note that either of 'features' or 'domains' arguments can be specified at the time.

plot

character, specifying the type of the plot, which can be either 'bar', 'waffle', or 'shap'. The default is 'bar'.

legendstyle

character, specifying the style of the plot legend, which can be either 'continuous' (default) or 'discrete'. the continuous legend is only applicable to 'shap' plots and other plots only use 'discrete' legend.

scale_colour_gradient

character vector for specifying the color gradients for the plot.

print

logical. if TRUE, the WMSHAP summary table for the given row is printed

Value

ggplot object

Author(s)

E. F. Haghish

Examples


## Not run: 
# load the required libraries for building the base-learners and the ensemble models
library(h2o)            #shapley supports h2o models
library(shapley)

# initiate the h2o server
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE)

# upload data to h2o cloud
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)

### H2O provides 2 types of grid search for tuning the models, which are
### AutoML and Grid. Below, I demonstrate how weighted mean shapley values
### can be computed for both types.

set.seed(10)

#######################################################
### PREPARE AutoML Grid (takes a couple of minutes)
#######################################################
# run AutoML to tune various models (GBM) for 60 seconds
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y])  #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 120,
                 include_algos=c("GBM"),

                 # this setting ensures the models are comparable for building a meta learner
                 seed = 2023, nfolds = 10,
                 keep_cross_validation_predictions = TRUE)

### call 'shapley' function to compute the weighted mean and weighted confidence intervals
### of SHAP values across all trained models.
### Note that the 'newdata' should be the testing dataset!
result <- shapley(models = aml, newdata = prostate, plot = TRUE)

#######################################################
### PLOT THE WEIGHTED MEAN SHAP VALUES
#######################################################

shapley.plot(result, plot = "bar")
shapley.plot(result, plot = "waffle")

#######################################################
### DEFINE DOMAINS (GROUPS OF FEATURES OR FACTORS)
#######################################################
shapley.domain(shapley = shapley, plot = "bar",
               domains = list(Demographic = c("RACE", "AGE"),
                              Cancer = c("VOL", "PSA", "GLEASON"),
                              Tests = c("DPROS", "DCAPS")),
                              print = TRUE

## End(Not run)

shapley documentation built on April 12, 2025, 2:16 a.m.