wpp: The Witness Protection Program for Causal Effect Estimation

Description Usage Arguments Details Value References Examples

Description

Perform a search for bounds on the average causal effect (ACE) of a given treatment variable X on a given outcome Y. Bounds are based on finding conditional instrumental variables using the faithfulness assumption relaxed to allow for a moderate degree of unfaithfulness. Candidate models are generated from the method described in covsearch.

Usage

1
2
3
wpp(problem, epsilons, max_set = 12, prior_ind = 0.5, prior_table = 10,
  cred_calc = TRUE, M = 1000, analytical_bounds = TRUE,
  pop_solve = FALSE, verbose = FALSE)

Arguments

problem

a cfx problem instance for the ACE of a given treatment X on a given outcome Y.

epsilons

an array of six positions corresponding to the relaxation parameters. In order: (1) the maximum difference in the conditional probability of the outcome given everything else, as the witness changes levels; (2) the maximum difference in the conditional probability of the outcome given everything else, and the conditional distribution excluding latent variables for the witness set at 0; (3) the maximum difference in the conditional probability of the outcome given everything else, and the conditional distribution excluding latent variables for the witness set at 1; (4) the maximum difference in the conditional probability of the treatment given its causes, and the conditional distribution excluding latent variables (5) the maximum ratio between the conditional distribution of the latent variable given the witness and the marginal distribution of the latent variable. This has to be greater than or equal to 1; (6) the minimum ratio between the conditional distribution of the latent variable given the witness and the marginal distribution of the latent variable. This has to be in the interval (0, 1].

max_set

maximum size of conditioning set. The cost of the procedure grows exponentially as a function of this, so be careful when increasing the default value.

prior_ind

prior probability of an independence.

prior_table

effective sample size hyperparameter of a Dirichlet prior for testing independence with contingency tables.

cred_calc

if TRUE, compute conditional credible intervals for the ACE of highest scoring model.

M

if necessary to compute (conditional) credible intervals, use Monte Carlo with this number of samples.

analytical_bounds

if cred_calc is TRUE, use the analytical method for computing bounds if this is also TRUE.

pop_solve

if TRUE, assume we know the population graph in problem instead of data. Notice that data is still used when computing posteriors over bounds.

verbose

if TRUE, print out more detailed information while running the procedure.

Details

Each pair of witness/admissible set found by covsearch will generate a corresponding lower bound and upper bound. The bounds reported in bounds are based on the posterior expected contingency table implied by prior_table, which uses a numerical method to optimize the bounds. Besides these point estimates, posterior distributions on the lower and upper bound for the highest scoring witness/admissible set can also be computed if the flag cred_calc is set to TRUE, and reported on bounds_post. If the option analytical_bounds is set to FALSE, the posterior distribution calculation will use the numerical method. It provides tighter bounds, but the computational cost is much higher. Please notice these posteriors are for the bounds conditional on the given choice of witness and admissible set: uncertainty on this choice is not taken into account.

A complete explanation of the method is given by Silva and Evans (2014, "Causal inference through a witness protection program", Advances in Neural Information Processing Systems, 27, 298–306).

Note: messages about numerical problems when calling the bound optimizer are not uncommon and are accounted for within the procedure.

Value

An object of class wpp containing the copies of the inputs problem, epsilons, prior_ind, prior_table, analytical_bounds, plus the following fields:

w_list

a list of arrays/lists, where each w_list$witness[i] is a witness, each w_list$Z[[i]] is the corresponding admissible set, and each w_list$witness_score[i] is the corresponding score for the witness/admissible set.

hw

witness corresponding to the highest scoring pair.

hZ

array containing admissible set corresponding to the highest scoring pair.

bounds

a two-column matrix where each row corresponds to a different witness/admissible set combination, and the two columns correspond to an estimate of the lower bound and upper bound as given by the posterior expected value given an inferred causal structure.

bounds_post

a two-column matrix, where rows correspond to different Monte carlo samples, and the two columns correspond to lower and upper bounds on the ACE as implied by epsilons with witness hw and admissible set hZ.

References

http://papers.nips.cc/paper/5602-causal-inference-through-a-witness-protection-program

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Generate a synthetic problem
problem <- simulateWitnessModel(p = 4, q = 4, par_max = 3, M = 200)

## Calculate true effect for evaluation purposes
sol_pop <- covsearch(problem, pop_solve = TRUE)
effect_pop <- synthesizeCausalEffect(problem)
cat(sprintf("ACE (true) = %1.2f\n", effect_pop$effect_real))

## WPP search (with a small number of Monte Carlo samples)
epsilons <- c(0.2, 0.2, 0.2, 0.2, 0.95, 1.05)
sol_wpp <- wpp(problem, epsilons, M = 100)
summary(sol_wpp)

rbas2015/CausalFX documentation built on May 27, 2019, 2:06 a.m.