View source: R/fastshap_explain_fastshap_modified.R
Explain | R Documentation |
Compute fast (approximate) Shapley values for a set of features using the Monte Carlo algorithm described in Strumbelj and Igor (2014). An efficient algorithm for tree-based models, commonly referred to as Tree SHAP, is also supported for lightgbm(https://cran.r-project.org/package=lightgbm) and xgboost(https://cran.r-project.org/package=xgboost) models; see Lundberg et. al. (2020) for details.
Explain(object, ...)
## Default S3 method:
Explain(
object,
feature_names = NULL,
X = NULL,
nsim = 1,
pred_wrapper = NULL,
newdata = NULL,
parallel = FALSE,
...
)
## S3 method for class 'lm'
Explain(
object,
feature_names = NULL,
X,
nsim = 1,
pred_wrapper,
newdata = NULL,
exact = FALSE,
parallel = FALSE,
...
)
## S3 method for class 'xgb.Booster'
Explain(
object,
feature_names = NULL,
X = NULL,
nsim = 1,
pred_wrapper,
newdata = NULL,
exact = FALSE,
parallel = FALSE,
...
)
## S3 method for class 'lgb.Booster'
Explain(
object,
feature_names = NULL,
X = NULL,
nsim = 1,
pred_wrapper,
newdata = NULL,
exact = FALSE,
parallel = FALSE,
...
)
object |
A fitted model object (e.g., a
|
... |
Additional arguments to be passed |
feature_names |
Character string giving the names of the predictor
variables (i.e., features) of interest. If |
X |
A matrix-like R object (e.g., a data frame or matrix) containing
ONLY the feature columns from the training data (or suitable background data
set). If the input includes categorical variables that need to be one-hot encoded,
please input data that has been processed using |
nsim |
The number of Monte Carlo repetitions to use for estimating each
Shapley value (only used when |
pred_wrapper |
Prediction function that requires two arguments,
|
newdata |
A matrix-like R object (e.g., a data frame or matrix)
containing ONLY the feature columns for the observation(s) of interest; that
is, the observation(s) you want to compute explanations for. Default is
|
parallel |
Logical indicating whether or not to compute the approximate
Shapley values in parallel across features; default is |
exact |
Logical indicating whether to compute exact Shapley values.
Currently only available for |
An object of class Explain
with the following components :
newdata |
The data frame formatted dataset employed for the estimation of Shapley values. If a variable has categories, categorical variables are one-hot encoded. |
phis |
A list format containing Shapley values for individual variables. |
fnull |
The expected value of the model's predictions. |
fx |
The prediction value for each observation. |
factor_names |
The name of the categorical variable.
If the data contains only continuous or dummy variables, it is set to |
Setting exact = TRUE
with a linear model (i.e., an
stats::lm()
or stats::glm()
object) assumes that the
input features are independent.
Strumbelj, E., and Igor K. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41(3), 647-665.
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, Su-In (2020). From local explanations to global understanding with Explainable AI for trees. Nature Machine Intelligence, 2(1), 2522–5839.
#
# A projection pursuit regression (PPR) example
#
# Load the sample data; see datasets::mtcars for details
data(mtcars)
# Fit a projection pursuit regression model
fit <- ppr(mpg ~ ., data = mtcars, nterms = 5)
# Prediction wrapper
pfun <- function(object, newdata) { # needs to return a numeric vector
predict(object, newdata = newdata)
}
# Compute approximate Shapley values using 10 Monte Carlo simulations
set.seed(101) # for reproducibility
shap <- Explain(fit, X = subset(mtcars, select = -mpg), nsim = 10,
pred_wrapper = pfun)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.