create_marginal_data_cat: Create marginal categorical data for causal Shapley values
In shapr: Prediction Explanation with Dependence-Aware Shapley Values

View source: R/asymmetric_and_casual_Shapley.R

create_marginal_data_cat

R Documentation

Create marginal categorical data for causal Shapley values

Description

This function is used when we generate marginal data for the categorical approach when we have several sampling steps. We need to treat this separately, as we here in the marginal step CANNOT make feature values such that the combination of those and the feature values we condition in S are NOT in categorical.joint_prob_dt. If we do this, then we cannot progress further in the chain of sampling steps. E.g., X1 in (1,2,3), X2 in (1,2,3), and X3 in (1,2,3). We know X2 = 2, and let causal structure be X1 -> X2 -> X3. Assume that P(X1 = 1, X2 = 2, X = 3) = P(X1 = 2, X2 = 2, X = 3) = 1/2. Then there is no point generating X1 = 3, as we then cannot generate X3. The solution is only to generate the values which can proceed through the whole chain of sampling steps. To do that, we have to ensure the the marginal sampling respects the valid feature coalitions for all sets of conditional features, i.e., the features in features_steps_cond_on. We sample from the valid coalitions using the MARGINAL probabilities.

Usage

create_marginal_data_cat(
  n_MC_samples,
  x_explain,
  Sbar_features,
  S_original,
  joint_prob_dt
)

Arguments

`n_MC_samples`	Positive integer. For most approaches, it indicates the maximum number of samples to use in the Monte Carlo integration of every conditional expectation. For `approach="ctree"`, `n_MC_samples` corresponds to the number of samples from the leaf node (see an exception related to the `ctree.sample` argument `setup_approach.ctree()`). For `approach="empirical"`, `n_MC_samples` is the `K` parameter in equations (14-15) of Aas et al. (2021), i.e. the maximum number of observations (with largest weights) that is used, see also the `empirical.eta` argument `setup_approach.empirical()`.
`x_explain`	Matrix or data.frame/data.table. Contains the the features, whose predictions ought to be explained.
`Sbar_features`	Vector of integers containing the features indices to generate marginal observations for. That is, if `Sbar_features` is `c(1,4)`, then we sample `n_MC_samples` observations from `P(X_1, X_4)`. That is, we sample the first and fourth feature values from the same valid feature coalition using the marginal probability, so we do not break the dependence between them.
`S_original`	Vector of integers containing the features indices of the original coalition `S`. I.e., not the features in the current sampling step, but the features are known to us before starting the chain of sampling steps.