structmcmc is a set of tools for performing structural inference for Bayesian Networks using MCMC in R
, a free software environment for statistical computing and graphics.
The widely-used MC3 algorithm (Madigan & Raftery, 1995) is implemented, as well as a number of variants of the algorithm. The MC3 algorithm is a Metropolis-Hastings sampler for which the target distribution is the posterior distribution of Bayesian networks. Tools for exact solutions are also available, but for networks with more than, say, 6 nodes, these will be prohibitively slow.
The package includes an implementation of the fast Gibbs sampler introduced in Goudie & Mukherjee, 2016.
The implementation allows the local conditional distributions to be multinomial or Gaussian, using standard priors. Arbitrary structural priors for the Bayesian network can be specified. The main difficulty in sampling Bayesian networks efficiently is ensuring the acyclicity constraint is not violated. The package implements the cycle-checking methods introduced by King & Sagert (2002), which is an alternative to the method introduced by Giudici & Castelo (2003). To enable convergence to be assessed, a number of tools for creating diagnostic plots are included.
Interfaces to a number of other R
packages for Bayesian networks are available, including deal
(hill-climbing and heuristic search), bnlearn
(a number of constraint-based and score-based algorithms) and pcalg
(PC-algorithm). An interface to gRain
is also included to allow its probability propagation routines to be used easily.
Each random variable has a Multinomial distribution, with the conjugate Dirichlet priors.
Data must be supplied as a data.frame
with p
columns (corresponding to p
random variables) and n
columns (corresponding to the n
samples). Each column must be a factor
variable.
x1 <- factor(c("a", "a", "g", "c", "c", "a", "g", "a", "a"))
x2 <- factor(c(2, 2, 4, 3, 1, 4, 4, 4, 1))
x3 <- factor(c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
x <- data.frame(x1 = x1, x2 = x2, x3 = x3)
Draw samples from the posterior using MC3.
set.seed(1234)
initial <- bn(c(), c(), c())
priorgraph <- bn(c(), c(1), c(2))
prior <- priorGraph(priorgraph, 0.5)
mcmc <- posterior(data = x, method = "mc3", prior = prior,
nSamples = 10000, nBurnin = 1000, initial = initial)
Compute and plot estimated edge probabilities given by the MCMC run
epmcmc <- ep(mcmc)
levelplot(epmcmc)
Since this is a problem with p = 3
, we can compute the posterior edge probabilies by exhaustive enumeration. This is only feasible for p <= 6
or so.
exact <- posterior(x, "exact", prior = prior)
epexact <- ep(exact)
levelplot(epexact)
Comparing multiple MCMC runs
mcmc2 <- posterior(data = x, method = "mc3", prior = prior,
nSamples = 10000, nBurnin = 1000, initial = initial)
epmcmc2 <- ep(mcmc2)
levelplot(epmcmc2)
Compare the final edge probabilities between runs
splom(bnpostmcmc.list(mcmc, mcmc2))
levelplot(ep.list(exact = epexact, mcmc = epmcmc))
Plot how the cumulative edge probabilities change as samples are drawn.
splom(cumep(bnpostmcmc.list(mcmc, mcmc2)))
Plot how the moving averaging edge probabilities change as samples are drawn.
splom(mwep(bnpostmcmc.list(mcmc, mcmc2)))
Plot how the cumulative total variance distance changes as samples are drawn
exactgp <- gp(exact)
splom(cumtvd(exactgp, bnpostmcmc.list(mcmc, mcmc2)))
Each random variable is assumed to be Normally-distributed, with a G-prior.
Data must be supplied as a matrix
with p
columns (corresponding to p
random variables) and n
columns (corresponding to the n
samples). Each column must be a numeric
variable.
x1 <- rnorm(20)
x2 <- rnorm(20)
x3 <- rnorm(20)
x <- matrix(c(x1, x2, x3), ncol = 3)
Draw samples from the posterior using MC3.
set.seed(1234)
initial <- bn(c(), c(), c())
mcmc <- posterior(data = x, method = "mc3",
logScoreFUN = logScoreNormalFUN(),
nSamples = 10000, nBurnin = 1000, initial = initial)
Compute and plot estimated edge probabilities given by the MCMC run
epmcmc <- ep(mcmc)
levelplot(epmcmc)
Since this is a problem with p = 3
, we can compute the posterior edge probabilies by exhaustive enumeration. This is only feasible for p <= 6
or so.
exact <- posterior(x, "exact", logScoreFUN = logScoreNormalFUN())
epexact <- ep(exact)
levelplot(epexact)
Comparing multiple MCMC runs
mcmc2 <- posterior(data = x, method = "mc3",
logScoreFUN = logScoreNormalFUN(),
nSamples = 10000, nBurnin = 1000, initial = initial)
epmcmc2 <- ep(mcmc2)
levelplot(epmcmc2)
Compare the final edge probabilities between runs
splom(bnpostmcmc.list(mcmc, mcmc2))
levelplot(ep.list(exact = epexact, mcmc = epmcmc))
Plot how the cumulative edge probabilities change as samples are drawn.
splom(cumep(bnpostmcmc.list(mcmc, mcmc2)))
Plot how the moving averaging edge probabilities change as samples are drawn.
splom(mwep(bnpostmcmc.list(mcmc, mcmc2)))
Plot how the cumulative total variance distance changes as samples are drawn
exactgp <- gp(exact)
splom(cumtvd(exactgp, bnpostmcmc.list(mcmc, mcmc2)))
Download the current version, and unzip
the file. Then install in R
using the following, where rjbgoudie-structmcmc-XXXXX
is the name of the unzip
ped directory/folder, and path/to/rjbgoudie-structmcmc-XXXXX
is the path to this folder.
install.packages("path/to/rjbgoudie-structmcmc-XXXXX", repos = NULL, type = "source")
The package depends on parental
, which provides a very lightweight directed graph object and basic manipulation tools for R.
Also required are lattice
, and grid
, both of which are included with R.
gRain
, nnet
, reshape
, zoo
are also recommended, and can be installed from CRAN, e.g. using
install.packages("reshape")
Goudie, R. J. B., & Mukherjee, S. (2016). A Gibbs Sampler for Learning DAGs. Journal of Machine Learning Research, 17(30), 1-39. http://jmlr.org/papers/v17/14-486.html
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.