covsearch: Search for Causal Effect Covariate Adjustment
In CausalFX: Methods for Estimating Causal Effects from Observational Data

Description Usage Arguments Details Value References Examples

Find the witnesses and adjustment sets (if any) for the average causal effect (ACE) between a given treatment variable X on a given outcome Y. This is done by an exhaustive search on a (reduced) set of possible candidates. Currently, only binary data is supported.

1
2
3

covsearch(problem, max_set = 12, min_only = TRUE, prior_ind = 0.5,
  prior_table = 10, cred_calc = FALSE, M = 1000, stop_at_first = FALSE,
  pop_solve = FALSE, verbose = FALSE)

`problem`	a `cfx` problem instance for the ACE of a given treatment X on a given outcome Y.
`max_set`	maximum size of conditioning set. The cost of the procedure grows exponentially as a function of this, so be careful when increasing the default value.
`min_only`	for each witness, once a set of a particular size is found, don't look for larger ones.
`prior_ind`	prior probability of an independence.
`prior_table`	effective sample size hyperparameter of a Dirichlet prior for testing independence with contingency tables.
`cred_calc`	if `TRUE`, compute conditional credible intervals for the ACE of highest scoring model.
`M`	if necessary to compute (conditional) credible intervals, use Monte Carlo with this number of samples.
`stop_at_first`	if `TRUE`, stop as soon as some witness is found.
`pop_solve`	if `TRUE`, assume we know the population graph in `problem` instead of data.
`verbose`	if `TRUE`, print out more detailed information while running the procedure.

The method assumes that the variables given in problem (other than problem$X_idx and problem$Y_idx) are covariates which causally precede treatment and outcome. It then applies the faithfulness condition of Spirtes, Glymour and Scheines (2000, Causation, Prediction and Search, MIT Press) to derive an admissible set: a set of covariates which removes all confounding between treatment and outcome when adjusted for. The necessary and sufficient conditions for finding an admissible set using the faithfulness assumption were discussed by Enter, Hoyer and Spirtes (2013, JMLR W&CP, vol. 31, 256–264). In order for a set to be proved an admissible set, some auxiliary variable in the covariate set is necessary - we call this variable a "witness." See Entner et al. for details. It is possible that no witness exists, which in this case the function returns an empty solution. Multiple witness/admissible sets might exist. The criterion for finding a witness/admissible set pair requires the testing of conditional independence constraints. The test is done by performing Bayesian model selection with a Dirichlet prior over the contingency table of the variables in problem using the effective sample size hyperparameter prior_table, and a prior probability of the independence hypothesis using the hyperparameter prior_ind.

For each witness/admissible set that passes this criterion, the function reports the posterior expected value of the implied ACE for each pair, by first plugging-in the posterior expected value of the contingency table as an estimate of the joint distribution. For a particular pair of witness/admissible set, chosen according to the best fit to the conditional independencies required by the criterion of Enter et al. (see also Silva and Evans, 2014, NIPS 298-306), we calculate the posterior distribution of the ACE. This posterior does not take into account the uncertainty on the choice of witness/admissible set, but instead is the conditional posterior given this choice.

The search for a witness/admissible set is by brute-force: for each witness, evaluate all subsets of the remaining covariates as candidate admissible sets. If there are too many covariates (more than max_set), only a filtered set of size max_set is considered for each witness. The set is chosen by first scoring each covariate by its empirical mutual information with the witness given problem$X_idx and picking the top max_set elements, to which a brute-force search is then applied.

A list containing problem plus the following items:

`witness`	array containing the indices of the witness variables.
`Z`	a list, where `Z[[i]]` is the i-th array containing the indices of the variables in the admissible set corresponding to witness `witness[i]`.
`witness_score`	array containing the scores of each witness/admissible set pair.
`hw`	witness corresponding to the highest scoring pair.
`hZ`	array containing admissible set corresponding to the highest scoring pair.
`ACEs`	array of average causal effects corresponding to each witness/admissible pair.
`ACEs_post`	array of samples corresponding to the posterior distribution of the ACE associated implied by `hW` and `hZ`.

http://jmlr.org/proceedings/papers/v31/entner13a.html

http://papers.nips.cc/paper/5602-causal-inference-through-a-witness-protection-program

## Generate a synthetic problem
problem <- simulateWitnessModel(p = 4, q = 4, par_max = 3, M = 1000)

## Idealized case: suppose we know the true distribution,
## get "exact" ACE estimands for different adjustment sets
sol_pop <- covsearch(problem, pop_solve = TRUE)
effect_pop <- synthetizeCausalEffect(problem)
cat(sprintf(
  "ACE (true) = %1.2f\nACE (adjusting for all) = %1.2f\nACE (adjusting for nothing) = %1.2f\n",
   effect_pop$effect_real, effect_pop$effect_naive, effect_pop$effect_naive2))

## Perform inference and report results
covariate_hat <- covsearch(problem, cred_calc = TRUE, M = 1000)
summary(covariate_hat)