Explain.default: Approximate Shapley Values
In bartXViz: Visualization of BART and BARP using SHAP

View source: R/fastshap_explain_fastshap_modified.R

Explain

R Documentation

Approximate Shapley Values

Description

Compute fast (approximate) Shapley values for a set of features using the Monte Carlo algorithm described in Strumbelj and Igor (2014). An efficient algorithm for tree-based models, commonly referred to as Tree SHAP, is also supported for lightgbm(https://cran.r-project.org/package=lightgbm) and xgboost(https://cran.r-project.org/package=xgboost) models; see Lundberg et. al. (2020) for details.

Usage

Explain(object, ...)

## Default S3 method:
Explain(
  object,
  feature_names = NULL,
  X = NULL,
  nsim = 1,
  pred_wrapper = NULL,
  newdata = NULL,
  parallel = FALSE,
  ...
)

## S3 method for class 'lm'
Explain(
  object,
  feature_names = NULL,
  X,
  nsim = 1,
  pred_wrapper,
  newdata = NULL,
  exact = FALSE,
  parallel = FALSE,
  ...
)

## S3 method for class 'xgb.Booster'
Explain(
  object,
  feature_names = NULL,
  X = NULL,
  nsim = 1,
  pred_wrapper,
  newdata = NULL,
  exact = FALSE,
  parallel = FALSE,
  ...
)

## S3 method for class 'lgb.Booster'
Explain(
  object,
  feature_names = NULL,
  X = NULL,
  nsim = 1,
  pred_wrapper,
  newdata = NULL,
  exact = FALSE,
  parallel = FALSE,
  ...
)

Arguments

`object`	A fitted model object (e.g., a `ranger::ranger()`, or `xgboost::xgboost()`,object, to name a few).
`...`	Additional arguments to be passed
`feature_names`	Character string giving the names of the predictor variables (i.e., features) of interest. If `NULL`(default) they will be taken from the column names of `X`.
`X`	A matrix-like R object (e.g., a data frame or matrix) containing ONLY the feature columns from the training data (or suitable background data set). If the input includes categorical variables that need to be one-hot encoded, please input data that has been processed using `data.table::one_hot()`. In XGBoost, the input should be the raw dataset containing only the explanatory variables, not the data created using `xgb.DMatrix`. NOTE: This argument is required whenever `exact = FALSE`.
`nsim`	The number of Monte Carlo repetitions to use for estimating each Shapley value (only used when `exact = FALSE`). Default is `1`. NOTE: To obtain the most accurate results, `nsim` should be set as large as feasibly possible.
`pred_wrapper`	Prediction function that requires two arguments, `object` and `newdata`. NOTE: This argument is required whenever `exact = FALSE`. The output of this function should be determined according to: Regression A numeric vector of predicted outcomes. Binary classification A vector of predicted class probabilities for the reference class. Multiclass classification A vector of predicted class probabilities for the reference class.
`newdata`	A matrix-like R object (e.g., a data frame or matrix) containing ONLY the feature columns for the observation(s) of interest; that is, the observation(s) you want to compute explanations for. Default is `NULL` which will produce approximate Shapley values for all the rows in `X` (i.e., the training data). If the input includes categorical variables that need to be one-hot encoded, please input data that has been processed using `data.table::one_hot()`.
`parallel`	Logical indicating whether or not to compute the approximate Shapley values in parallel across features; default is `FALSE`. NOTE: setting `parallel = TRUE` requires setting up an appropriate (i.e., system-specific) parallel backend as described in the foreach(https://cran.r-project.org/package=foreach); for details, see `vignette("foreach", package = "foreach")` in R.
`exact`	Logical indicating whether to compute exact Shapley values. Currently only available for `stats::lm()`(https://CRAN.R-project.org/package=STAT), `xgboost::xgboost()` (https://CRAN.R-project.org/package=xgboost), and `lightgbm::lightgbm()`(https://CRAN.R-project.org/package=lightgbm) objects. Default is `FALSE`. Note that setting `exact = TRUE` will return explanations for each of the `stats::terms()` in an `stats::lm()` object. Default is `FALSE`.

Value

An object of class Explain with the following components :

`newdata`	The data frame formatted dataset employed for the estimation of Shapley values. If a variable has categories, categorical variables are one-hot encoded.
`phis`	A list format containing Shapley values for individual variables.
`fnull`	The expected value of the model's predictions.
`fx`	The prediction value for each observation.
`factor_names`	The name of the categorical variable. If the data contains only continuous or dummy variables, it is set to `NULL`.

Note

Setting exact = TRUE with a linear model (i.e., an stats::lm() or stats::glm() object) assumes that the input features are independent.

References

Strumbelj, E., and Igor K. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41(3), 647-665.

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, Su-In (2020). From local explanations to global understanding with Explainable AI for trees. Nature Machine Intelligence, 2(1), 2522–5839.

Examples


#
# A projection pursuit regression (PPR) example
#

# Load the sample data; see datasets::mtcars for details
data(mtcars)

# Fit a projection pursuit regression model
fit <- ppr(mpg ~ ., data = mtcars, nterms = 5)

# Prediction wrapper
pfun <- function(object, newdata) {  # needs to return a numeric vector
  predict(object, newdata = newdata)  
}

# Compute approximate Shapley values using 10 Monte Carlo simulations
set.seed(101)  # for reproducibility
shap <- Explain(fit, X = subset(mtcars, select = -mpg), nsim = 10, 
                pred_wrapper = pfun)

bartXViz documentation built on Aug. 8, 2025, 6:23 p.m.