knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) oldpar <- par(no.readonly = TRUE) oldoptions <- options()
```{css, echo = FALSE} .math.inline { font-size: 11px; }
```r library(BCDAG)
This is the second of a series of three vignettes for the R package BCDAG
.
In this vignette we focus on function learn_DAG()
, which implements a Markov Chain Monte Carlo (MCMC) algorithm to sample from the joint posterior of DAG structures and DAG-parameters under the Gaussian assumption.
The underlying Bayesian Gaussian DAG-model can be summarized as follows: \begin{eqnarray} X_1, \dots, X_q \,|\,\boldsymbol L, \boldsymbol D, \mathcal{D} &\sim& \mathcal{N}q\left(\boldsymbol 0, (\boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top)^{-1}\right)\ (\boldsymbol L, \boldsymbol D)\,|\,\mathcal{D} &\sim& \text{DAG-Wishart}(\boldsymbol{a}{c}^{\mathcal{D}}, \boldsymbol U) \ p(\mathcal{D}) &\propto& w^{|\mathcal{S}\mathcal{D}|}(1-w)^{\frac{q(q-1)}{2} - {|\mathcal{S}\mathcal{D}|}} \end{eqnarray}
In particular $\mathcal{D}=(V,E)$ denotes a DAG structure with set of nodes $V={1,\dots,q}$ and set of edges $E\subseteq V \times V$. Moreover, $(\boldsymbol L, \boldsymbol D)$ are model parameters providing the decomposition of the precision (inverse-covariance) matrix $\boldsymbol{\Omega} = \boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top$; specifically, $\boldsymbol{L}$ is a $(q, q)$ matrix of coefficients such that for each $(u, v)$-element $\boldsymbol{L}{uv}$ with $u \ne v$, $\boldsymbol{L}{uv} \ne 0$ if and only if $(u, v) \in E$, while $\boldsymbol{L}{uu} = 1$ for each $u = 1,\dots, q$; also, $\boldsymbol{D}$ is a $(q, q)$ diagonal matrix with $(u, u)$-element $\boldsymbol{D}{uu}$.
The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG-model; see also Castelletti \& Mascaro (2021).
Conditionally to $\mathcal{D}$, a prior to $(\boldsymbol{L}, \boldsymbol{D})$ is assigned through a compatible DAG-Wishart distribution with rate hyperparameter $\boldsymbol{U}$, a $(q,q)$ s.p.d. matrix, and shape hyperparameter $\boldsymbol{a}^{\mathcal {D}}_{c}$, a $(q,1)$ vector; see also Cao et al. (2019) and Peluso \& Consonni (2020).
Finally, a prior on DAG $\mathcal{D}$ is assigned through a Binomial distribution on the number of edges in the graph; in $p(\mathcal{D})$, $w \in (0,1)$ is a prior probability of edge inclusion, while $|\mathcal{S_{\mathcal{D}}}|$ denotes the number of edges in $\mathcal{D}$; see again Castelletti \& Mascaro (2021) for further details.
Target of the MCMC scheme is therefore the joint posterior of $(\boldsymbol{L},\boldsymbol{D},\mathcal{D})$, \begin{equation} p(\boldsymbol L, \boldsymbol D, \mathcal{D}\,|\, \boldsymbol X) \propto p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})p(\boldsymbol{L},\boldsymbol{D}\,|\,\mathcal{D}) \,p(\mathcal{D}), \end{equation} where $\boldsymbol{X}$ denotes a $(n,q)$ data matrix as obtained through $n$ i.i.d. draws from the Gaussian DAG-model and $p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})$ is the likelihood function. See also Castelletti \& Mascaro (2022+) for full details.
We first randomly generate a DAG $\mathcal{D}$, the DAG parameters $(\boldsymbol{L},\boldsymbol{D})$ and then $n=1000$ i.i.d. observations from a Gaussian DAG-model as follows:
set.seed(1) q <- 8 w <- 0.2 DAG <- rDAG(q,w) a <- q U <- diag(1,q) outDL <- rDAGWishart(n=1, DAG, a, U) L <- outDL$L; D <- outDL$D Omega <- L %*% solve(D) %*% t(L) Sigma <- solve(Omega) n <- 1000 X <- mvtnorm::rmvnorm(n = n, sigma = Sigma)
See also our vignette about data generation from Gaussian DAG-models.
learn_DAG()
Function learn_DAG()
implements an MCMC algorithm to sample from the joint posterior of DAGs and DAG parameters. This is based on a Partial Analytic Structure (PAS) algorithm (Godsill, 2012) which, at each iteration:
fast = TRUE
]);We implement it as follows:
out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE)
out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE)
Inputs of learn_DAG()
correspond to three different sets of arguments:
S
, burn
and data
are standard inputs required by any MCMC algorithm.
In particular, S
defines the desired length of the chain, which is obtained by discarding the first burn
observations (the total number of sampled observations is therefore S + burn
); data
is instead the $(n,q)$ matrix $\boldsymbol{X}$;a
, U
and w
are hyperparameters of the priors on DAGs (w
) and DAG parameters (a
, U
); see also Equation [REF]. The same appear in functions rDAG()
and rDAGWishart()
which were introduced in our vignette [ADD REF TO THE VIGNETTE].fast
, save.memory
and collapse
are boolean arguments which allow to: implement an approximate proposal distribution within the MCMC scheme (fast = TRUE
); change the array structure of the stored MCMC output into strings in order to save memory (save.memory = TRUE
);
implement an MCMC for DAG structure learning only, without sampling from the posterior of parameters (collapse = TRUE
).
See also [A note on fast = TRUE
] and Castelletti \& Mascaro (2022+) for full details.The output of learn_DAG()
is an object of class bcdag
:
class(out)
bcdag
objects include the output of the MCMC algorithm together with a collection of meta-data representing the input arguments of learn_DAG()
; these are stored in the attributes of the object::
str(out)
Attribute type
refers to the output of learn_DAG()
, whose structure
depends on the choice of the arguments save.memory
and collapse
.
\vspace{0.2cm}
When both are set equal to FALSE
, as in the previous example, the output of learn_DAG()
is a complete bcdag
object, collecting three $(q,q,S)$ arrays with the DAG structures (in the form of $q \times q$ adjacency matrices) and the DAG parameters $\boldsymbol{L}$ and $\boldsymbol{D}$ (both as $q \times q$ matrices) sampled by the MCMC:
out$Graphs[,,1] round(out$L[,,1],2) round(out$D[,,1],2)
\vspace{0.2cm}
When collapse = TRUE
and save.memory = FALSE
the output of learn_DAG()
is a collapsed bcdag
object, consisting of a $(q,q,S)$ array with the adjacency matrices of the DAGs sampled by the MCMC:
collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = TRUE)
collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = TRUE)
names(collapsed_out) class(collapsed_out) attributes(collapsed_out)$type collapsed_out$Graphs[,,1]
\vspace{0.2cm}
When save.memory = TRUE
and collapse = FALSE
, the output is a compressed bcdag
object, collecting samples from the joint posterior on DAGs and DAG parameters in the form of a vector of strings:
compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = FALSE)
compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = FALSE)
names(compressed_out) class(compressed_out) attributes(compressed_out)$type
In such a case, we can access to the MCMC draws as:
compressed_out$Graphs[1] compressed_out$L[1] compressed_out$D[1]
In addition, we implement bd_decode
, an internal function that can be used to visualize the previous objects as matrices:
BCDAG:::bd_decode(compressed_out$Graphs[1]) round(BCDAG:::bd_decode(compressed_out$L[1]),2) round(BCDAG:::bd_decode(compressed_out$D[1]),2)
\vspace{0.2cm}
Finally, if save.memory = TRUE
and collapse = TRUE
, the output of learn_DAG()
is a compressed and collapsed bcdag
object collecting only the sampled DAGs represented as vector of strings:
comprcoll_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = TRUE)
comprcoll_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = TRUE)
names(comprcoll_out) class(comprcoll_out) attributes(comprcoll_out)$type BCDAG:::bd_decode(comprcoll_out$Graphs[1])
fast = TRUE
Step 1. of the MCMC scheme implemented by learn_DAG()
updates DAG $\mathcal{D}$ by randomly drawing a new candidate DAG $\mathcal{D}'$ from a proposal distribution and then accepting it with probability given by the Metropolis Hastings (MH) acceptance rate; see also Castelletti \& Mascaro (2021).
For a given DAG $\mathcal{D}$, the proposal distribution $q(\mathcal{D}'\,|\,\mathcal{D})$ is built over the set $\mathcal{O}{\mathcal{D}}$ of \emp{all} direct successors DAGs that can be reached from $\mathcal{D}$ by inserting, deleting or reversing a single edge in $\mathcal{D}$.
A DAG $\mathcal{D}'$ is then proposed uniformly from the set $\mathcal{O}{\mathcal{D}}$ so that $q(\mathcal{D}'\,|\,\mathcal{D})=1/|\mathcal{O}{\mathcal{D}}|$.
Moreover, the MH rate requires to evaluate the ratio of proposals $q(\mathcal{D}'\,|\,\mathcal{D})/q(\mathcal{D}\,|\,\mathcal{D}') = |\mathcal{O}{\mathcal{D}'}|/|\mathcal{O}{\mathcal{D}}|$, and accordingly the construction of both $\mathcal{O}{\mathcal{D}}$ and $\mathcal{O}_{\mathcal{D}'}$.
If fast = FALSE
, the proposal ratio is computed exactly; this requires the enumerations of $\mathcal{O}\mathcal{D}$ and $\mathcal{O}{\mathcal{D}'}$ which may become computationally expensive, especially when $q$ is large.
However, the ratio approaches $1$ as the number of variables $q$ increases: option fast = TRUE
implements such an approximation, which therefore avoids the construction of $\mathcal{O}\mathcal{D}$ and $\mathcal{O}{\mathcal{D}'}$.
A comparison between fast = FALSE
and fast = TRUE
in the execution of learn_DAG()
produces the following results in terms of computational time:
# No approximation time_nofast <- system.time(out_nofast <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE)) # Approximation time_fast <- system.time(out_fast <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = TRUE, save.memory = FALSE, collapse = FALSE))
time_nofast time_fast
Finally, the corresponding estimated posterior probabilities of edge inclusion are the following:
round(get_edgeprobs(out_nofast), 2) round(get_edgeprobs(out_fast), 2)
Ben-David E, Li T, Massam H, Rajaratnam B (2015). “High dimensional Bayesian inference for Gaussian directed acyclic graph models.” arXiv pre-print.
Cao X, Khare K, Ghosh M (2019). “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” The Annals of Statistics, 47(1), 319–348.
Castelletti F, Consonni G (2021). “Bayesian causal inference in probit graphical models” Bayesian Analysis, In press.
Castelletti F, Mascaro A (2021). “Structural learning and estimation of joint causal effects among network-dependent variables.” Statistical Methods & Applications, 30, 1289–1314.
Castelletti F, Mascaro A (2022). “BCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.” arXiv pre-print.
Godsill, SJ (2012). "On the relationship between Markov chain Monte Carlo methods for model uncertainty." Journal of computational and graphical statistics, 10(2), 230-248.
Peluso S, Consonni G (2020). “Compatible priors for model selection of high-dimensional Gaussian DAGs.” Electronic Journal of Statistics, 14(2), 4110–4132.
par(oldpar) options(oldoptions)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.