explain_forecast: Explain a forecast from a time series model using Shapley...

View source: R/explain_forecast.R

explain_forecastR Documentation

Explain a forecast from a time series model using Shapley values.

Description

Computes dependence-aware Shapley values for observations in explain_idx from the specified model by using the method specified in approach to estimate the conditional expectation.

Usage

explain_forecast(
  model,
  y,
  xreg = NULL,
  train_idx = NULL,
  explain_idx,
  explain_y_lags,
  explain_xreg_lags = explain_y_lags,
  horizon,
  approach,
  prediction_zero,
  n_combinations = NULL,
  group_lags = TRUE,
  group = NULL,
  n_samples = 1000,
  n_batches = NULL,
  seed = 1,
  keep_samp_for_vS = FALSE,
  predict_model = NULL,
  get_model_specs = NULL,
  timing = TRUE,
  verbose = 0,
  ...
)

Arguments

model

The model whose predictions we want to explain. Run get_supported_models() for a table of which models explain supports natively. Unsupported models can still be explained by passing predict_model and (optionally) get_model_specs, see details for more information.

y

Matrix, data.frame/data.table or a numeric vector. Contains the endogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained.

xreg

Matrix, data.frame/data.table or a numeric vector. Contains the exogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained. As exogenous variables are used contemporaneusly when producing a forecast, this item should contain nrow(y) + horizon rows.

train_idx

Numeric vector The row indices in data and reg denoting points in time to use when estimating the conditional expectations in the Shapley value formula. If train_idx = NULL (default) all indices not selected to be explained will be used.

explain_idx

Numeric vector The row indices in data and reg denoting points in time to explain.

explain_y_lags

Numeric vector. Denotes the number of lags that should be used for each variable in y when making a forecast.

explain_xreg_lags

Numeric vector. If xreg != NULL, denotes the number of lags that should be used for each variable in xreg when making a forecast.

horizon

Numeric. The forecast horizon to explain. Passed to the predict_model function.

approach

Character vector of length 1 or one less than the number of features. All elements should, either be "gaussian", "copula", "empirical", "ctree", "vaeac", "categorical", "timeseries", "independence", "regression_separate", or "regression_surrogate". The two regression approaches can not be combined with any other approach. See details for more information.

prediction_zero

Numeric. The prediction value for unseen data, i.e. an estimate of the expected prediction without conditioning on any features. Typically we set this value equal to the mean of the response variable in our training data, but other choices such as the mean of the predictions in the training data are also reasonable.

n_combinations

Integer. If group = NULL, n_combinations represents the number of unique feature combinations to sample. If group != NULL, n_combinations represents the number of unique group combinations to sample. If n_combinations = NULL, the exact method is used and all combinations are considered. The maximum number of combinations equals 2^m, where m is the number of features.

group_lags

Logical. If TRUE all lags of each variable are grouped together and explained as a group. If FALSE all lags of each variable are explained individually.

group

List. If NULL regular feature wise Shapley values are computed. If provided, group wise Shapley values are computed. group then has length equal to the number of groups. The list element contains character vectors with the features included in each of the different groups.

n_samples

Positive integer. Indicating the maximum number of samples to use in the Monte Carlo integration for every conditional expectation. See also details.

n_batches

Positive integer (or NULL). Specifies how many batches the total number of feature combinations should be split into when calculating the contribution function for each test observation. The default value is NULL which uses a reasonable trade-off between RAM allocation and computation speed, which depends on approach and n_combinations. For models with many features, increasing the number of batches reduces the RAM allocation significantly. This typically comes with a small increase in computation time.

seed

Positive integer. Specifies the seed before any randomness based code is being run. If NULL the seed will be inherited from the calling environment.

keep_samp_for_vS

Logical. Indicates whether the samples used in the Monte Carlo estimation of v_S should be returned (in internal$output)

predict_model

Function. The prediction function used when model is not natively supported. (Run get_supported_models() for a list of natively supported models.) The function must have two arguments, model and newdata which specify, respectively, the model and a data.frame/data.table to compute predictions for. The function must give the prediction as a numeric vector. NULL (the default) uses functions specified internally. Can also be used to override the default function for natively supported model classes.

get_model_specs

Function. An optional function for checking model/data consistency when model is not natively supported. (Run get_supported_models() for a list of natively supported models.) The function takes model as argument and provides a list with 3 elements:

labels

Character vector with the names of each feature.

classes

Character vector with the classes of each features.

factor_levels

Character vector with the levels for any categorical features.

If NULL (the default) internal functions are used for natively supported model classes, and the checking is disabled for unsupported model classes. Can also be used to override the default function for natively supported model classes.

timing

Logical. Whether the timing of the different parts of the explain() should saved in the model object.

verbose

An integer specifying the level of verbosity. If 0, shapr will stay silent. If 1, it will print information about performance. If 2, some additional information will be printed out. Use 0 (default) for no verbosity, 1 for low verbose, and 2 for high verbose. TODO: Make this clearer when we end up fixing this and if they should force a progressr bar.

...

Arguments passed on to setup_approach.empirical, setup_approach.independence, setup_approach.gaussian, setup_approach.copula, setup_approach.ctree, setup_approach.vaeac, setup_approach.categorical, setup_approach.timeseries

empirical.type

Character. (default = "fixed_sigma") Should be equal to either "independence","fixed_sigma", "AICc_each_k" "AICc_full". TODO: Describe better what the methods do here.

empirical.eta

Numeric. (default = 0.95) Needs to be ⁠0 < eta <= 1⁠. Represents the minimum proportion of the total empirical weight that data samples should use. If e.g. eta = .8 we will choose the K samples with the largest weight so that the sum of the weights accounts for 80\ eta is the \eta parameter in equation (15) of Aas et al (2021).

empirical.fixed_sigma

Positive numeric scalar. (default = 0.1) Represents the kernel bandwidth in the distance computation used when conditioning on all different combinations. Only used when empirical.type = "fixed_sigma"

empirical.n_samples_aicc

Positive integer. (default = 1000) Number of samples to consider in AICc optimization. Only used for empirical.type is either "AICc_each_k" or "AICc_full".

empirical.eval_max_aicc

Positive integer. (default = 20) Maximum number of iterations when optimizing the AICc. Only used for empirical.type is either "AICc_each_k" or "AICc_full".

empirical.start_aicc

Numeric. (default = 0.1) Start value of the sigma parameter when optimizing the AICc. Only used for empirical.type is either "AICc_each_k" or "AICc_full".

empirical.cov_mat

Numeric matrix. (Optional, default = NULL) Containing the covariance matrix of the data generating distribution used to define the Mahalanobis distance. NULL means it is estimated from x_train.

internal

Not used.

gaussian.mu

Numeric vector. (Optional) Containing the mean of the data generating distribution. NULL means it is estimated from the x_train.

gaussian.cov_mat

Numeric matrix. (Optional) Containing the covariance matrix of the data generating distribution. NULL means it is estimated from the x_train.

ctree.mincriterion

Numeric scalar or vector. (default = 0.95) Either a scalar or vector of length equal to the number of features in the model. Value is equal to 1 - \alpha where \alpha is the nominal level of the conditional independence tests. If it is a vector, this indicates which value to use when conditioning on various numbers of features.

ctree.minsplit

Numeric scalar. (default = 20) Determines minimum value that the sum of the left and right daughter nodes required for a split.

ctree.minbucket

Numeric scalar. (default = 7) Determines the minimum sum of weights in a terminal node required for a split

ctree.sample

Boolean. (default = TRUE) If TRUE, then the method always samples n_samples observations from the leaf nodes (with replacement). If FALSE and the number of observations in the leaf node is less than n_samples, the method will take all observations in the leaf. If FALSE and the number of observations in the leaf node is more than n_samples, the method will sample n_samples observations (with replacement). This means that there will always be sampling in the leaf unless sample = FALSE AND the number of obs in the node is less than n_samples.

vaeac.depth

Positive integer (default is 3). The number of hidden layers in the neural networks of the masked encoder, full encoder, and decoder.

vaeac.width

Positive integer (default is 32). The number of neurons in each hidden layer in the neural networks of the masked encoder, full encoder, and decoder.

vaeac.latent_dim

Positive integer (default is 8). The number of dimensions in the latent space.

vaeac.lr

Positive numeric (default is 0.001). The learning rate used in the torch::optim_adam() optimizer.

vaeac.activation_function

An torch::nn_module() representing an activation function such as, e.g., torch::nn_relu() (default), torch::nn_leaky_relu(), torch::nn_selu(), or torch::nn_sigmoid().

vaeac.n_vaeacs_initialize

Positive integer (default is 4). The number of different vaeac models to initiate in the start. Pick the best performing one after vaeac.extra_parameters$epochs_initiation_phase epochs (default is 2) and continue training that one.

vaeac.epochs

Positive integer (default is 100). The number of epochs to train the final vaeac model. This includes vaeac.extra_parameters$epochs_initiation_phase, where the default is 2.

vaeac.extra_parameters

Named list with extra parameters to the vaeac approach. See vaeac_get_extra_para_default() for description of possible additional parameters and their default values.

categorical.joint_prob_dt

Data.table. (Optional) Containing the joint probability distribution for each combination of feature values. NULL means it is estimated from the x_train and x_explain.

categorical.epsilon

Numeric value. (Optional) If joint_probability_dt is not supplied, probabilities/frequencies are estimated using x_train. If certain observations occur in x_train and NOT in x_explain, then epsilon is used as the proportion of times that these observations occurs in the training data. In theory, this proportion should be zero, but this causes an error later in the Shapley computation.

timeseries.fixed_sigma_vec

Numeric. (Default = 2) Represents the kernel bandwidth in the distance computation. TODO: What length should it have? 1?

timeseries.bounds

Numeric vector of length two. (Default = c(NULL, NULL)) If one or both of these bounds are not NULL, we restrict the sampled time series to be between these bounds. This is useful if the underlying time series are scaled between 0 and 1, for example.

Details

This function explains a forecast of length horizon. The argument train_idx is analogous to x_train in explain(), however, it just contains the time indices of where in the data the forecast should start for each training sample. In the same way explain_idx defines the time index (indices) which will precede a forecast to be explained.

As any autoregressive forecast model will require a set of lags to make a forecast at an arbitrary point in time, explain_y_lags and explain_xreg_lags define how many lags are required to "refit" the model at any given time index. This allows the different approaches to work in the same way they do for time-invariant models.

Value

Object of class c("shapr", "list"). Contains the following items:

shapley_values

data.table with the estimated Shapley values

internal

List with the different parameters, data and functions used internally

pred_explain

Numeric vector with the predictions for the explained observations

MSEv

List with the values of the MSEv evaluation criterion for the approach.

shapley_values is a data.table where the number of rows equals the number of observations you'd like to explain, and the number of columns equals m +1, where m equals the total number of features in your model.

If shapley_values[i, j + 1] > 0 it indicates that the j-th feature increased the prediction for the i-th observation. Likewise, if shapley_values[i, j + 1] < 0 it indicates that the j-th feature decreased the prediction for the i-th observation. The magnitude of the value is also important to notice. E.g. if shapley_values[i, k + 1] and shapley_values[i, j + 1] are greater than 0, where j != k, and shapley_values[i, k + 1] > shapley_values[i, j + 1] this indicates that feature j and k both increased the value of the prediction, but that the effect of the k-th feature was larger than the j-th feature.

The first column in dt, called none, is the prediction value not assigned to any of the features (\phi0). It's equal for all observations and set by the user through the argument prediction_zero. The difference between the prediction and none is distributed among the other features. In theory this value should be the expected prediction without conditioning on any features. Typically we set this value equal to the mean of the response variable in our training data, but other choices such as the mean of the predictions in the training data are also reasonable.

Author(s)

Martin Jullum, Lars Henry Berge Olsen

References

Aas, K., Jullum, M., & L<U+00F8>land, A. (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence, 298, 103502.

Examples


# Load example data
data("airquality")
data <- data.table::as.data.table(airquality)

# Fit an AR(2) model.
model_ar_temp <- ar(data$Temp, order = 2)

# Calculate the zero prediction values for a three step forecast.
p0_ar <- rep(mean(data$Temp), 3)

# Empirical approach, explaining forecasts starting at T = 152 and T = 153.
explain_forecast(
  model = model_ar_temp,
  y = data[, "Temp"],
  train_idx = 2:151,
  explain_idx = 152:153,
  explain_y_lags = 2,
  horizon = 3,
  approach = "empirical",
  prediction_zero = p0_ar,
  group_lags = FALSE
)


NorskRegnesentral/shapr documentation built on April 19, 2024, 1:19 p.m.