permshap: Permutation SHAP
In kernelshap: Kernel SHAP

View source: R/permshap.R

permshap

R Documentation

Permutation SHAP

Description

Exact permutation SHAP algorithm with respect to a background dataset, see Strumbelj and Kononenko. The function works for up to 14 features. For eight or more features, we recomment to switch to kernelshap().

Usage

permshap(object, ...)

## Default S3 method:
permshap(
  object,
  X,
  bg_X = NULL,
  pred_fun = stats::predict,
  feature_names = colnames(X),
  bg_w = NULL,
  bg_n = 200L,
  parallel = FALSE,
  parallel_args = NULL,
  verbose = TRUE,
  ...
)

## S3 method for class 'ranger'
permshap(
  object,
  X,
  bg_X = NULL,
  pred_fun = NULL,
  feature_names = colnames(X),
  bg_w = NULL,
  bg_n = 200L,
  parallel = FALSE,
  parallel_args = NULL,
  verbose = TRUE,
  survival = c("chf", "prob"),
  ...
)

Arguments

`object`	Fitted model object.
`...`	Additional arguments passed to `pred_fun(object, X, ...)`.
`X`	`(n \times p)` matrix or `data.frame` with rows to be explained. The columns should only represent model features, not the response (but see `feature_names` on how to overrule this).
`bg_X`	Background data used to integrate out "switched off" features, often a subset of the training data (typically 50 to 500 rows). In cases with a natural "off" value (like MNIST digits), this can also be a single row with all values set to the off value. If no `bg_X` is passed (the default) and if `X` is sufficiently large, a random sample of `bg_n` rows from `X` serves as background data.
`pred_fun`	Prediction function of the form `⁠function(object, X, ...)⁠`, providing `K \ge 1` predictions per row. Its first argument represents the model `object`, its second argument a data structure like `X`. Additional (named) arguments are passed via `...`. The default, `stats::predict()`, will work in most cases.
`feature_names`	Optional vector of column names in `X` used to calculate SHAP values. By default, this equals `colnames(X)`. Not supported if `X` is a matrix.
`bg_w`	Optional vector of case weights for each row of `bg_X`. If `bg_X = NULL`, must be of same length as `X`. Set to `NULL` for no weights.
`bg_n`	If `bg_X = NULL`: Size of background data to be sampled from `X`.
`parallel`	If `TRUE`, use parallel `foreach::foreach()` to loop over rows to be explained. Must register backend beforehand, e.g., via 'doFuture' package, see README for an example. Parallelization automatically disables the progress bar.
`parallel_args`	Named list of arguments passed to `foreach::foreach()`. Ideally, this is `NULL` (default). Only relevant if `parallel = TRUE`. Example on Windows: if `object` is a GAM fitted with package 'mgcv', then one might need to set `parallel_args = list(.packages = "mgcv")`.
`verbose`	Set to `FALSE` to suppress messages and the progress bar.
`survival`	Should cumulative hazards ("chf", default) or survival probabilities ("prob") per time be predicted? Only in `ranger()` survival models.

Value

An object of class "kernelshap" with the following components:

S: (n \times p) matrix with SHAP values or, if the model output has dimension K > 1, a list of K such matrices.
X: Same as input argument X.
baseline: Vector of length K representing the average prediction on the background data.
bg_X: The background data.
bg_w: The background case weights.
m_exact: Integer providing the effective number of exact on-off vectors used.
exact: Logical flag indicating whether calculations are exact or not (currently TRUE).
txt: Summary text.
predictions: (n \times K) matrix with predictions of X.
algorithm: "permshap".

Methods (by class)

permshap(default): Default permutation SHAP method.
permshap(ranger): Permutation SHAP method for "ranger" models, see Readme for an example.

References

Erik Strumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 2014.

Examples

# MODEL ONE: Linear regression
fit <- lm(Sepal.Length ~ ., data = iris)

# Select rows to explain (only feature columns)
X_explain <- iris[-1]

# Calculate SHAP values
s <- permshap(fit, X_explain)
s

# MODEL TWO: Multi-response linear regression
fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width + Species, data = iris)
s <- permshap(fit, iris[3:5])
s

# Note 1: Feature columns can also be selected 'feature_names'
# Note 2: Especially when X is small, pass a sufficiently large background data bg_X
s <- permshap(
  fit,
  iris[1:4, ],
  bg_X = iris,
  feature_names = c("Petal.Length", "Petal.Width", "Species")
)
s

kernelshap documentation built on Sept. 11, 2024, 9:35 p.m.