knitr::opts_chunk$set(echo = TRUE)

I've been working on a causal inference problem recently. In causal inference you collect data and try to infer causality. But sometimes you know causality in some parts in the system before you collect any data. Or maybe you don't know for certain, but you think there is probably some causality in parts of the system. Blah blah [@fenner2012a; @chickering1995transformational].

Wouldn't it be nice if you could form some kind of causal prior?

PDAGs and Point Estimates

Let's work an example. I work with the Asia model, a small synthetic model from Lauritzen and Spiegelhalter (1988) about lung diseases (tuberculosis, lung cancer or bronchitis) and visits to Asia (this was not intended as a causal model, here we suppose it is). Here is the causal directed acyclic graph (DAG) for the Asia model:

library(bnlearn)
net1 <- empty.graph(names(asia))
modelstring(net1) <- "[A][S][T|A][L|S][B|S][D|B:E][E|T:L][X|E]"
graphviz.plot(net1)

I use the cpdag() algorithm to generate the following graph, which illustrates what causal information can be inferred from data. I'll discuss this algorithm shortly.

graphviz.plot(cpdag(net1))

The undirected edges are edges whose causal direction cannot be learned from data alone. More formally, different causal graphs where these edges are oriented in different directions belong to the same Markov equivalence class, meaning a class of graphs that are statistically equivalent to one another.

For example, here is another member of the class.

net2 <- empty.graph(names(asia))
modelstring(net2) <- "[T][B][A|T][L|S][S|B][D|B:E][E|T:L][X|E]"
graphviz.plot(net2)

We can see that these are Markov equivalent by noting both graphs equally likely (have the same log-likelihood) given the data.

score(net1, asia, type="loglik")
score(net2, asia, type="loglik")

In non-Bayesian settings, you find the model that is the most likely, and that is your best guess (point estimator) for what the true model is. So this result is important, because it means that if you pick a single graph as your best guess, if that graph belongs is part of an equivalence class with multiple members, then you know your best guess is no better than using other members of the class.

cpdag() is an algorithm that generates a "partially directed acyclic graph" (PDAG) which uses undirected edges to show which directed edges vary across members of the class [@dor1992]. Indeed, the PDAG itself becomes a representative of the class. What is great about this algorithm is that it doesn't need data, it just figures this out from a DAG's structure.

This is useful because you can use the entire PDAG as your best guess. The PDAG represents a class of equivalent DAGs -- the PDAG says that while your best guess cannot explain causality completely, it can for the edges that are directed because each member of the class has that directed edge.

The Bayesian Approach

Bayesians don't think "this is my one estimate of the true model", rather we think in terms of how probable a model is relative to others. Therefore, the Bayesian concept of Markov equivalence should mean two models are equally probable (not just equally likely).

"Equally probable" means having the same posterior probability. Indeed, the two above graphs have the same posterior probability if the prior doesn't favor certain edges over others. In the following example I use the Bayesian Dirichlet equivalent score, which is used to learn sparse graphs in structure learning.

score(net1, asia, type="bde", prior = "uniform")
score(net2, asia, type="bde", prior = "uniform")

Now let us randomly generate a table of edge-wise prior probabilities.

beta <- as.data.frame(arcs(net1), stringsAsFactors = F)
beta$prob <- runif(narcs(net1))
beta

I can convert this edge-wise prior into a graph prior using the method described in [@cs]. Now in this case we get different scores:

score(net1, asia, type="bde", prior = "cs", beta = beta)
score(net2, asia, type="bde", prior = "cs", beta = beta)

Here is the problem, cpdag() hasn't changed. It still reports that these graphs come from the same equivalence class...

identical(cpdag(net1), cpdag(net2))

So Markov equivalence only refers to likelihood equivalence, not posterior equivalence unless we use some prior that doesn't care about edge direction. This is a problem in causal inference because edge direction means causal direction, which is what we are interested in.

So how can we adjust the cpdag() algorithm to think about causal priors?

Causal Bayesian Equivalence Classes

This "edge posterior-equivalance class" should satisfy the following requirements.

1 It satisfies all the structural properties of an equivalence class described in [chickering1995transformational], such as each member having the same skeleton. 2 Each member should have the same posterior.

The posterior is proportional to the product of the likelihood and the prior. So the cpdag() algorithm as it stands should be able to satisfy 1. Members of an equivalence class returned by cpdag() will be equivalent in the likelihood, not the prior. So the new equivalence class will have to be a subset of the class returned by cpdag(). In other words, this new PDAG should have the same skeleton as the cpdag(), but more directed edges.

To understand how to adjust the cpdag() algorithm, let's figure out when it does work with a prior. We have already seen this works for non-edge specific priors, like BDe.

Two graphs of the same cpdag() class will also have the same posterior if the edges that differ between the two graphs have the same prior probability for either direction. In the following example, I assign adjust the random prior edge-probabilities to one where A -> T and A <- T both have a probability of .4, and B -> S and S <- B both have a prior probability of .3.

beta2 <- beta
beta2[1, ]$prob <- .4 # A -> T
beta2 <- rbind(beta2, c("T", "A", .4))
beta2[3, ]$prob <- .3 # S -> B
beta2 <- rbind(beta2, c("B", "S", .3))
beta2$prob <- as.numeric(beta2$prob)
score(net1, asia, type="bde", prior = "cs", beta = beta2)
score(net2, asia, type="bde", prior = "cs", beta = beta2)

The only edges that can have differing priors are those in open (immoral or unshielded) v-structures. The reason is that direction of edges in open v-structures must be consistent across all members of the class, so changing the prior for those edges affects all members the same way. For example, the T->E edge in the Asia net is in an open v-structure. Assigning T->E a prior of .7 and E->T a prior of .2 yields the same posterior score for net1 and net2.

beta3 <- beta2
beta3[6, ]$prob <- .7 # T -> E
beta3 <- rbind(beta3, c("E", "T", .2))
beta3$prob <- as.numeric(beta3$prob)
score(net1, asia, type="bde", prior = "cs", beta = beta3)
score(net2, asia, type="bde", prior = "cs", beta = beta3)

So here is my adjustment for the cpdag() algorithm.

In other words, the only time cpdag() as it stands should work for edge-wise priors is if the edge-wise priors

the only the edges that can differ in their prior probabilities are edges in immoral v-structures, can have a more informed prior probability. (This suggests that a prior based on v-structures might be easier to work with, but encoding subjective knowledge into such a prior would be unintuitive.)

This suggests a simple modification to an algorithm for finding an equivalence class PDAG given a DAG. If the prior probability on the orientation of the edge is not uniform in both directions, then it cannot be undirected.

References



robertness/bninfo documentation built on May 27, 2019, 10:32 a.m.