setup: check_setup
In shapr: Prediction Explanation with Dependence-Aware Shapley Values

setup

R Documentation

check_setup

Description

check_setup

Usage

setup(
  x_train,
  x_explain,
  approach,
  phi0,
  output_size = 1,
  max_n_coalitions,
  group,
  n_MC_samples,
  seed,
  feature_specs,
  type = "regular",
  horizon = NULL,
  y = NULL,
  xreg = NULL,
  train_idx = NULL,
  explain_idx = NULL,
  explain_y_lags = NULL,
  explain_xreg_lags = NULL,
  group_lags = NULL,
  verbose,
  iterative = NULL,
  iterative_args = list(),
  is_python = FALSE,
  testing = FALSE,
  init_time = NULL,
  prev_shapr_object = NULL,
  asymmetric = FALSE,
  causal_ordering = NULL,
  confounding = NULL,
  output_args = list(),
  extra_computation_args = list(),
  ...
)

Arguments

`x_train`	Matrix or data.frame/data.table. Contains the data used to estimate the (conditional) distributions for the features needed to properly estimate the conditional expectations in the Shapley formula.
`x_explain`	Matrix or data.frame/data.table. Contains the the features, whose predictions ought to be explained.
`approach`	Character vector of length `1` or one less than the number of features. All elements should, either be `"gaussian"`, `"copula"`, `"empirical"`, `"ctree"`, `"vaeac"`, `"categorical"`, `"timeseries"`, `"independence"`, `"regression_separate"`, or `"regression_surrogate"`. The two regression approaches can not be combined with any other approach. See details for more information.
`phi0`	Numeric. The prediction value for unseen data, i.e. an estimate of the expected prediction without conditioning on any features. Typically we set this value equal to the mean of the response variable in our training data, but other choices such as the mean of the predictions in the training data are also reasonable.
`output_size`	Scalar integer. Specifies the dimension of the output from the prediction model for every observation.
`max_n_coalitions`	Integer. The upper limit on the number of unique feature/group coalitions to use in the iterative procedure (if `iterative = TRUE`). If `iterative = FALSE` it represents the number of feature/group coalitions to use directly. The quantity refers to the number of unique feature coalitions if `group = NULL`, and group coalitions if `group != NULL`. `max_n_coalitions = NULL` corresponds to `max_n_coalitions=2^n_features`.
`group`	List. If `NULL` regular feature wise Shapley values are computed. If provided, group wise Shapley values are computed. `group` then has length equal to the number of groups. The list element contains character vectors with the features included in each of the different groups. See Jullum et al. (2021) for more information on group wise Shapley values.
`n_MC_samples`	Positive integer. For most approaches, it indicates the maximum number of samples to use in the Monte Carlo integration of every conditional expectation. For `approach="ctree"`, `n_MC_samples` corresponds to the number of samples from the leaf node (see an exception related to the `ctree.sample` argument `setup_approach.ctree()`). For `approach="empirical"`, `n_MC_samples` is the `K` parameter in equations (14-15) of Aas et al. (2021), i.e. the maximum number of observations (with largest weights) that is used, see also the `empirical.eta` argument `setup_approach.empirical()`.
`seed`	Positive integer. Specifies the seed before any randomness based code is being run. If `NULL` (default) no seed is set in the calling environment.
`feature_specs`	List. The output from `get_model_specs()` or `get_data_specs()`. Contains the 3 elements: labels Character vector with the names of each feature. classes Character vector with the classes of each features. factor_levels Character vector with the levels for any categorical features.
`type`	Character. Either "regular" or "forecast" corresponding to function `setup()` is called from, correspondingly the type of explanation that should be generated.
`horizon`	Numeric. The forecast horizon to explain. Passed to the `predict_model` function.
`y`	Matrix, data.frame/data.table or a numeric vector. Contains the endogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained.
`xreg`	Matrix, data.frame/data.table or a numeric vector. Contains the exogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained. As exogenous variables are used contemporaneously when producing a forecast, this item should contain nrow(y) + horizon rows.
`train_idx`	Numeric vector. The row indices in data and reg denoting points in time to use when estimating the conditional expectations in the Shapley value formula. If `train_idx = NULL` (default) all indices not selected to be explained will be used.
`explain_idx`	Numeric vector. The row indices in data and reg denoting points in time to explain.
`explain_y_lags`	Numeric vector. Denotes the number of lags that should be used for each variable in `y` when making a forecast.
`explain_xreg_lags`	Numeric vector. If `xreg != NULL`, denotes the number of lags that should be used for each variable in `xreg` when making a forecast.
`group_lags`	Logical. If `TRUE` all lags of each variable are grouped together and explained as a group. If `FALSE` all lags of each variable are explained individually.
`verbose`	String vector or NULL. Specifies the verbosity (printout detail level) through one or more of strings `"basic"`, `"progress"`, `"convergence"`, `"shapley"` and `"vS_details"`. `"basic"` (default) displays basic information about the computation which is being performed, in addition to some messages about parameters being sets or checks being unavailable due to specific input. `⁠"progress⁠` displays information about where in the calculation process the function currently is. #' `"convergence"` displays information on how close to convergence the Shapley value estimates are (only when `iterative = TRUE`) . `"shapley"` displays intermediate Shapley value estimates and standard deviations (only when `iterative = TRUE`) and the final estimates. `"vS_details"` displays information about the v_S estimates. This is most relevant for `⁠approach %in% c("regression_separate", "regression_surrogate", "vaeac"⁠`). `NULL` means no printout. Note that any combination of four strings can be used. E.g. `verbose = c("basic", "vS_details")` will display basic information + details about the v(S)-estimation process.
`iterative`	Logical or NULL If `NULL` (default), the argument is set to `TRUE` if there are more than 5 features/groups, and `FALSE` otherwise. If eventually `TRUE`, the Shapley values are estimated iteratively in an iterative manner. This provides sufficiently accurate Shapley value estimates faster. First an initial number of coalitions is sampled, then bootsrapping is used to estimate the variance of the Shapley values. A convergence criterion is used to determine if the variances of the Shapley values are sufficiently small. If the variances are too high, we estimate the number of required samples to reach convergence, and thereby add more coalitions. The process is repeated until the variances are below the threshold. Specifics related to the iterative process and convergence criterion are set through `iterative_args`.
`iterative_args`	Named list. Specifies the arguments for the iterative procedure. See `get_iterative_args_default()` for description of the arguments and their default values.
`is_python`	Logical. Indicates whether the function is called from the Python wrapper. Default is FALSE which is never changed when calling the function via `explain()` in R. The parameter is later used to disallow running the AICc-versions of the empirical method as that requires data based optimization, which is not supported in `shaprpy`.
`testing`	Logical. Only use to remove random components like timing from the object output when comparing output with testthat. Defaults to `FALSE`.
`init_time`	POSIXct object. The time when the `explain()` function was called, as outputted by `Sys.time()`. Used to calculate the time it took to run the full `explain` call.
`prev_shapr_object`	`shapr` object or string. If an object of class `shapr` is provided, or string with a path to where intermediate results are stored, then the function will use the previous object to continue the computation. This is useful if the computation is interrupted or you want higher accuracy than already obtained, and therefore want to continue the iterative estimation. See the general usage vignette for examples.
`asymmetric`	Logical. Not applicable for (regular) non-causal or asymmetric explanations. If `FALSE` (default), `explain` computes regular symmetric Shapley values, If `TRUE`, then `explain` compute asymmetric Shapley values based on the (partial) causal ordering given by `causal_ordering`. That is, `explain` only uses the feature combinations/coalitions that respect the causal ordering when computing the asymmetric Shapley values. If `asymmetric` is `TRUE` and `confounding` is `NULL` (default), then `explain` computes asymmetric conditional Shapley values as specified in Frye et al. (2020). If `confounding` is provided, i.e., not `NULL`, then `explain` computes asymmetric causal Shapley values as specified in Heskes et al. (2020).
`causal_ordering`	List. Not applicable for (regular) non-causal or asymmetric explanations. `causal_ordering` is an unnamed list of vectors specifying the components of the partial causal ordering that the coalitions must respect. Each vector represents a component and contains one or more features/groups identified by their names (strings) or indices (integers). If `causal_ordering` is `NULL` (default), no causal ordering is assumed and all possible coalitions are allowed. No causal ordering is equivalent to a causal ordering with a single component that includes all features (`list(1:n_features)`) or groups (`list(1:n_groups)`) for feature-wise and group-wise Shapley values, respectively. For feature-wise Shapley values and `causal_ordering = list(c(1, 2), c(3, 4))`, the interpretation is that features 1 and 2 are the ancestors of features 3 and 4, while features 3 and 4 are on the same level. Note: All features/groups must be included in the `causal_ordering` without any duplicates.
`confounding`	Logical vector. Not applicable for (regular) non-causal or asymmetric explanations. `confounding` is a vector of logicals specifying whether confounding is assumed or not for each component in the `causal_ordering`. If `NULL` (default), then no assumption about the confounding structure is made and `explain` computes asymmetric/symmetric conditional Shapley values, depending on the value of `asymmetric`. If `confounding` is a single logical, i.e., `FALSE` or `TRUE`, then this assumption is set globally for all components in the causal ordering. Otherwise, `confounding` must be a vector of logicals of the same length as `causal_ordering`, indicating the confounding assumption for each component. When `confounding` is specified, then `explain` computes asymmetric/symmetric causal Shapley values, depending on the value of `asymmetric`. The `approach` cannot be `regression_separate` and `regression_surrogate` as the regression-based approaches are not applicable to the causal Shapley value methodology.
`output_args`	Named list. Specifies certain arguments related to the output of the function. See `get_output_args_default()` for description of the arguments and their default values.
`extra_computation_args`	Named list. Specifies extra arguments related to the computation of the Shapley values. See `get_extra_comp_args_default()` for description of the arguments and their default values.
`...`	Further arguments passed to specific approaches, see below.

Value

A internal list, containing parameters, info, data and computations needed for the later computations. The list is expanded and modified in other functions.

shapr documentation built on June 8, 2025, 10:22 a.m.