MCMC with Parallel Chains

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(BayesMallows)

This vignette describes how to run Markov chain Monte Carlo with parallel chains. For an introduction to the "BayesMallows" package, please see @sorensen2020.

Why Parallel Chains?

Modern computers have multiple cores, and on computing clusters one can get access to hundreds of cores easily. By running Markov Chains in parallel on $K$ cores, ideally from different starting points, we achieve at least the following:

  1. The time you have to wait to get the required number of post-burnin samples scales like $1/K$.

  2. You can check convergence by comparing chains.

Parallel Chains with Complete Rankings

In "BayesMallows" we use the "parallel" package for parallel computation. Parallelization is obtained by starting a cluster and providing it as an argument. Since the limit to parallelism for vignettes being built on CRAN is 2, we here start a cluster spanning two cores, but in real applications this number should typically be larger (the output from parallel::detectCores() can be a good guide). Note that we also give one initial value of the dispersion parameter $\alpha$ to each chain.

library(parallel)
cl <- makeCluster(2)
fit <- compute_mallows(
  rankings = potato_visual, nmc = 5000,
  cl = cl
)
stopCluster(cl)

We can assess convergence in the usual way:

assess_convergence(fit)

We can also assess convergence for the latent ranks $\boldsymbol{\rho}$. Since the initial value of $\boldsymbol{\rho}$ is sampled uniformly, the two chains automatically get different initial values.

assess_convergence(fit, parameter = "rho", items = 1:3)

Based on the convergence plots, we set the burnin to 700.

fit$burnin <- 700

We can now use all the tools for assessing the posterior distributions as usual. The post-burnin samples for all parallel chains are simply combined, as they should be.

Below is a plot of the posterior distribution of $\alpha$.

plot(fit)

Next is a plot of the posterior distribution of $\boldsymbol{\rho}$.

plot(fit, parameter = "rho", items = 4:7)

Parallel Chains with Pairwise Preferences

A case where parallel chains might be more strongly needed is with incomplete data, e.g., arising from pairwise preferences. In this case the MCMC algorithm needs to perform data augmentation, which tends to be both slow and sticky. We illustrate this with the beach preference data, again referring to @sorensen2020 for a more thorough introduction to the aspects not directly related to parallelism.

We start by generating the transitive closure:

beach_tc <- generate_transitive_closure(beach_preferences)

Next we run two parallel chains, letting the package generate random initial rankings, but again providing a vector of initial values for $\alpha$.

cl <- makeCluster(2)
fit <- compute_mallows(
  preferences = beach_tc, nmc = 4000,
  alpha_init = runif(2, 1, 4),
  save_aug = TRUE, cl = cl
)
stopCluster(cl)

Trace Plots

The convergence plots shows some long-range autocorrelation, but otherwise it seems to mix relatively well.

assess_convergence(fit)

Here is the convergence plot for $\boldsymbol{\rho}$:

assess_convergence(fit, parameter = "rho", items = 4:6)

To avoid overplotting, it's a good idea to pick a low number of assessors and chains. We here look at items 1-3 of assessors 1 and 2.

assess_convergence(fit,
  parameter = "Rtilde",
  items = 1:3, assessors = 1:2
)

Posterior Quantities

Based on the trace plots, the chains seem to be mixing well. We set the burnin to 700 again.

fit$burnin <- 700

We can now study the posterior distributions. Here is the posterior for $\alpha$. Note that by increasing the nmc argument to compute_mallows above, the density would appear smoother. In this vignette we have kept it low to reduce the run time.

plot(fit)

We can also look at the posterior for $\boldsymbol{\rho}$.

plot(fit, parameter = "rho", items = 6:9)

We can also compute posterior intervals in the usual way:

compute_posterior_intervals(fit, parameter = "alpha")
compute_posterior_intervals(fit, parameter = "rho")

And we can compute the consensus ranking:

compute_consensus(fit)
compute_consensus(fit, type = "MAP")

We can compute the probability of being top-$k$, here for $k=4$:

plot_top_k(fit, k = 4)

References



Try the BayesMallows package in your browser

Any scripts or data that you put into this service are public.

BayesMallows documentation built on Nov. 25, 2023, 5:09 p.m.