README.md

structmcmc

structmcmc is a set of tools for performing structural inference for Bayesian Networks using MCMC in R, a free software environment for statistical computing and graphics.

The widely-used MC3 algorithm (Madigan & Raftery, 1995) is implemented, as well as a number of variants of the algorithm. The MC3 algorithm is a Metropolis-Hastings sampler for which the target distribution is the posterior distribution of Bayesian networks. Tools for exact solutions are also available, but for networks with more than, say, 6 nodes, these will be prohibitively slow.

The package includes an implementation of the fast Gibbs sampler introduced in Goudie & Mukherjee, 2016.

The implementation allows the local conditional distributions to be multinomial or Gaussian, using standard priors. Arbitrary structural priors for the Bayesian network can be specified. The main difficulty in sampling Bayesian networks efficiently is ensuring the acyclicity constraint is not violated. The package implements the cycle-checking methods introduced by King & Sagert (2002), which is an alternative to the method introduced by Giudici & Castelo (2003). To enable convergence to be assessed, a number of tools for creating diagnostic plots are included.

Interfaces to a number of other R packages for Bayesian networks are available, including deal (hill-climbing and heuristic search), bnlearn (a number of constraint-based and score-based algorithms) and pcalg (PC-algorithm). An interface to gRain is also included to allow its probability propagation routines to be used easily.

Basic operation, discrete data

View script as file

Each random variable has a Multinomial distribution, with the conjugate Dirichlet priors.

Data must be supplied as a data.frame with p columns (corresponding to p random variables) and n columns (corresponding to the n samples). Each column must be a factor variable.

x1 <- factor(c("a", "a", "g", "c", "c", "a", "g", "a", "a"))
x2 <- factor(c(2, 2, 4, 3, 1, 4, 4, 4, 1))
x3 <- factor(c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
x <- data.frame(x1 = x1, x2 = x2, x3 = x3)

Draw samples from the posterior using MC3.

set.seed(1234)
initial <- bn(c(), c(), c())
priorgraph <- bn(c(), c(1), c(2))
prior <- priorGraph(priorgraph, 0.5)
mcmc <- posterior(data = x, method = "mc3", prior = prior,
                  nSamples = 10000, nBurnin = 1000, initial = initial)

Compute and plot estimated edge probabilities given by the MCMC run

epmcmc <- ep(mcmc)
levelplot(epmcmc)

Since this is a problem with p = 3, we can compute the posterior edge probabilies by exhaustive enumeration. This is only feasible for p <= 6 or so.

exact <- posterior(x, "exact", prior = prior)
epexact <- ep(exact)
levelplot(epexact)

Comparing multiple MCMC runs

mcmc2 <- posterior(data = x, method = "mc3", prior = prior,
                   nSamples = 10000, nBurnin = 1000, initial = initial)
epmcmc2 <- ep(mcmc2)
levelplot(epmcmc2)

Compare the final edge probabilities between runs

splom(bnpostmcmc.list(mcmc, mcmc2))
levelplot(ep.list(exact = epexact, mcmc = epmcmc))

Plot how the cumulative edge probabilities change as samples are drawn.

splom(cumep(bnpostmcmc.list(mcmc, mcmc2)))

Plot how the moving averaging edge probabilities change as samples are drawn.

splom(mwep(bnpostmcmc.list(mcmc, mcmc2)))

Plot how the cumulative total variance distance changes as samples are drawn

exactgp <- gp(exact)
splom(cumtvd(exactgp, bnpostmcmc.list(mcmc, mcmc2)))

Basic operation, continuous data

View script as file

Each random variable is assumed to be Normally-distributed, with a G-prior.

Data must be supplied as a matrix with p columns (corresponding to p random variables) and n columns (corresponding to the n samples). Each column must be a numeric variable.

x1 <- rnorm(20)
x2 <- rnorm(20)
x3 <- rnorm(20)
x <- matrix(c(x1, x2, x3), ncol = 3)

Draw samples from the posterior using MC3.

set.seed(1234)
initial <- bn(c(), c(), c())
mcmc <- posterior(data = x, method = "mc3",
                  logScoreFUN = logScoreNormalFUN(),
                  nSamples = 10000, nBurnin = 1000, initial = initial)

Compute and plot estimated edge probabilities given by the MCMC run

epmcmc <- ep(mcmc)
levelplot(epmcmc)

Since this is a problem with p = 3, we can compute the posterior edge probabilies by exhaustive enumeration. This is only feasible for p <= 6 or so.

exact <- posterior(x, "exact", logScoreFUN = logScoreNormalFUN())
epexact <- ep(exact)
levelplot(epexact)

Comparing multiple MCMC runs

mcmc2 <- posterior(data = x, method = "mc3",
                   logScoreFUN = logScoreNormalFUN(),
                   nSamples = 10000, nBurnin = 1000, initial = initial)
epmcmc2 <- ep(mcmc2)
levelplot(epmcmc2)

Compare the final edge probabilities between runs

splom(bnpostmcmc.list(mcmc, mcmc2))
levelplot(ep.list(exact = epexact, mcmc = epmcmc))

Plot how the cumulative edge probabilities change as samples are drawn.

splom(cumep(bnpostmcmc.list(mcmc, mcmc2)))

Plot how the moving averaging edge probabilities change as samples are drawn.

splom(mwep(bnpostmcmc.list(mcmc, mcmc2)))

Plot how the cumulative total variance distance changes as samples are drawn

exactgp <- gp(exact)
splom(cumtvd(exactgp, bnpostmcmc.list(mcmc, mcmc2)))

Installation

Download the current version, and unzip the file. Then install in R using the following, where rjbgoudie-structmcmc-XXXXX is the name of the unzipped directory/folder, and path/to/rjbgoudie-structmcmc-XXXXX is the path to this folder.

install.packages("path/to/rjbgoudie-structmcmc-XXXXX", repos = NULL, type = "source")

The package depends on parental, which provides a very lightweight directed graph object and basic manipulation tools for R.

Also required are lattice, and grid, both of which are included with R.

gRain, nnet, reshape, zoo are also recommended, and can be installed from CRAN, e.g. using

install.packages("reshape")

How to cite

Goudie, R. J. B., & Mukherjee, S. (2016). A Gibbs Sampler for Learning DAGs. Journal of Machine Learning Research, 17(30), 1-39. http://jmlr.org/papers/v17/14-486.html



rjbgoudie/structmcmc documentation built on Nov. 3, 2020, 3:41 a.m.