View source: R/explain_forecast.R
explain_forecast | R Documentation |
Computes dependence-aware Shapley values for observations in explain_idx
from the specified
model
by using the method specified in approach
to estimate the conditional expectation.
explain_forecast(
model,
y,
xreg = NULL,
train_idx = NULL,
explain_idx,
explain_y_lags,
explain_xreg_lags = explain_y_lags,
horizon,
approach,
prediction_zero,
n_combinations = NULL,
group_lags = TRUE,
group = NULL,
n_samples = 1000,
n_batches = NULL,
seed = 1,
keep_samp_for_vS = FALSE,
predict_model = NULL,
get_model_specs = NULL,
timing = TRUE,
verbose = 0,
...
)
model |
The model whose predictions we want to explain.
Run |
y |
Matrix, data.frame/data.table or a numeric vector. Contains the endogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained. |
xreg |
Matrix, data.frame/data.table or a numeric vector. Contains the exogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained. As exogenous variables are used contemporaneusly when producing a forecast, this item should contain nrow(y) + horizon rows. |
train_idx |
Numeric vector
The row indices in data and reg denoting points in time to use when estimating the conditional expectations in
the Shapley value formula.
If |
explain_idx |
Numeric vector The row indices in data and reg denoting points in time to explain. |
explain_y_lags |
Numeric vector.
Denotes the number of lags that should be used for each variable in |
explain_xreg_lags |
Numeric vector.
If |
horizon |
Numeric.
The forecast horizon to explain. Passed to the |
approach |
Character vector of length |
prediction_zero |
Numeric. The prediction value for unseen data, i.e. an estimate of the expected prediction without conditioning on any features. Typically we set this value equal to the mean of the response variable in our training data, but other choices such as the mean of the predictions in the training data are also reasonable. |
n_combinations |
Integer.
If |
group_lags |
Logical.
If |
group |
List.
If |
n_samples |
Positive integer. Indicating the maximum number of samples to use in the Monte Carlo integration for every conditional expectation. See also details. |
n_batches |
Positive integer (or NULL).
Specifies how many batches the total number of feature combinations should be split into when calculating the
contribution function for each test observation.
The default value is NULL which uses a reasonable trade-off between RAM allocation and computation speed,
which depends on |
seed |
Positive integer.
Specifies the seed before any randomness based code is being run.
If |
keep_samp_for_vS |
Logical.
Indicates whether the samples used in the Monte Carlo estimation of v_S should be returned
(in |
predict_model |
Function.
The prediction function used when |
get_model_specs |
Function.
An optional function for checking model/data consistency when
If |
timing |
Logical.
Whether the timing of the different parts of the |
verbose |
An integer specifying the level of verbosity. If |
... |
Arguments passed on to
|
This function explains a forecast of length horizon
. The argument train_idx
is analogous to x_train in explain()
, however, it just contains the time indices of where
in the data the forecast should start for each training sample. In the same way explain_idx
defines the time index (indices) which will precede a forecast to be explained.
As any autoregressive forecast model will require a set of lags to make a forecast at an
arbitrary point in time, explain_y_lags
and explain_xreg_lags
define how many lags
are required to "refit" the model at any given time index. This allows the different
approaches to work in the same way they do for time-invariant models.
Object of class c("shapr", "list")
. Contains the following items:
data.table with the estimated Shapley values
List with the different parameters, data and functions used internally
Numeric vector with the predictions for the explained observations
List with the values of the MSEv evaluation criterion for the approach.
shapley_values
is a data.table where the number of rows equals
the number of observations you'd like to explain, and the number of columns equals m +1
,
where m
equals the total number of features in your model.
If shapley_values[i, j + 1] > 0
it indicates that the j-th feature increased the prediction for
the i-th observation. Likewise, if shapley_values[i, j + 1] < 0
it indicates that the j-th feature
decreased the prediction for the i-th observation.
The magnitude of the value is also important to notice. E.g. if shapley_values[i, k + 1]
and
shapley_values[i, j + 1]
are greater than 0
, where j != k
, and
shapley_values[i, k + 1]
> shapley_values[i, j + 1]
this indicates that feature
j
and k
both increased the value of the prediction, but that the effect of the k-th
feature was larger than the j-th feature.
The first column in dt
, called none
, is the prediction value not assigned to any of the features
(\phi
0).
It's equal for all observations and set by the user through the argument prediction_zero
.
The difference between the prediction and none
is distributed among the other features.
In theory this value should be the expected prediction without conditioning on any features.
Typically we set this value equal to the mean of the response variable in our training data, but other choices
such as the mean of the predictions in the training data are also reasonable.
Martin Jullum, Lars Henry Berge Olsen
Aas, K., Jullum, M., & L<U+00F8>land, A. (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence, 298, 103502.
# Load example data
data("airquality")
data <- data.table::as.data.table(airquality)
# Fit an AR(2) model.
model_ar_temp <- ar(data$Temp, order = 2)
# Calculate the zero prediction values for a three step forecast.
p0_ar <- rep(mean(data$Temp), 3)
# Empirical approach, explaining forecasts starting at T = 152 and T = 153.
explain_forecast(
model = model_ar_temp,
y = data[, "Temp"],
train_idx = 2:151,
explain_idx = 152:153,
explain_y_lags = 2,
horizon = 3,
approach = "empirical",
prediction_zero = p0_ar,
group_lags = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.