In jeff324/powder: Power posterior sampling via differential evolution

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

When presented with several competing formal models, one is required to select between these different explanations: a process commonly known as model selection. One of the more robust methods of performing model selection is through Bayes factors. However, Bayes factors require the computation of the marginal likelihood for each model, which, in turn, requires integrating over the entire parameter space. This makes the estimation of the Bayes factor for models with more than a few dimensions problematic for numerical integration techniques. Early Monte Carlo methods such as arithmetic/harmonic mean estimators or reversible-jump MCMC have been used with some success, but still face problems (Friel & Wyse, 2012). Other solutions exist, such as using GPUs (Evans & Brown, 2017), but are challenging to implement and are done on a model-by-model basis. In this vignette, we present two recent advancements, thermodynamic integration (Friel & Pettitt, 2008; Lartillot & Philippe, 2006) and stepping stone sampling (Xie, Lewis, Fan, Kuo, & Chen, 2011), and show how the powder package is used to estimate them. We use the Linear Ballistic Accumulator model (Brown & Heathcote, 2008) as an example.

Bayesian Model Selection

Bayesian model selection is more easily introduced by considering the more familiar case of Bayesian parameter estimation. Recall Bayes' rule defines the posterior distribution as:

\begin{equation} p(\boldsymbol{\theta} |\boldsymbol{D},\mathcal{M}) = \frac{p(\boldsymbol{D} | \boldsymbol{\theta},\mathcal{M} ) p(\boldsymbol{\theta}|\mathcal{M})}{p(\boldsymbol{D}|\mathcal{M})} \end{equation}

where $p(\boldsymbol{D} | \boldsymbol{\theta},\mathcal{M})$ is the likelihood function, $p(\boldsymbol{\theta}|\mathcal{M})$ is the prior probability of the parameters, and $p(\boldsymbol{D}|\mathcal{M})$ is the marginal likelihood found by marginalizing over all possible parameter values.

While Bayesian parameter estimation is primarily concerned with estimating the posterior distribution, $p(\boldsymbol{\theta}|\boldsymbol{D},\mathcal{M})$, the quantity of interest in Bayesian model selection is the marginal likelihood:

\begin{equation} p(\boldsymbol{D}|\mathcal{M}) = \int_{}^{} p(\boldsymbol{D} | \boldsymbol{\theta}) p(\boldsymbol{\theta})\, d\boldsymbol{\theta} \end{equation}

After obtaining the marginal likelihood for each model, the Bayes factor, a measure of evidence provided by the data in favor of one model over the other, can then be computed:

\begin{equation} \frac{p(\boldsymbol{D} | \mathcal{M}_1)}{p(\boldsymbol{D} | \mathcal{M}_2)} \end{equation}

In most real-world cases, the marginal likelihoods must be obtained via Monte Carlo techniques. We describe thermodynamic integration and steppingstone sampling below:

Power posteriors

Thermodynamic Integration (TI) and Steppingstone sampling define a set of posterior distributions. The likelihood of each posterior is raised to a power, $t_{j}={0,...,1}$ (called the \textit{temperature}). These new posteriors are referred to as \textit{power posteriors} and are defined as:

\begin{equation} p(\boldsymbol{\theta}|\boldsymbol{D},t_{j}) = \frac{p(\boldsymbol{D}|\boldsymbol{\theta})^{t_{j}}p(\boldsymbol{\theta})}{\int_{}^{} p(\boldsymbol{D} | \boldsymbol{\theta})^{t_{j}} p(\boldsymbol{\theta})\, d\boldsymbol{\theta}} \end{equation}

Thermodynamic Integration

After obtaining samples from each power posterior, $\boldsymbol{\theta}{i,j} \sim p(\boldsymbol{\theta} |\boldsymbol{D},t{j})$, the average log-likelihoods, $\frac{1}{n} \sum_{i=1}^{n}\ln p(\boldsymbol{D}|\boldsymbol{\theta}_{i,j})$, are computed. These form $k$ points along a one-dimensional curve with respect to $t$, and the area under this curve is an estimate of the marginal likelihood. Since it is a one-dimensional curve, its area is easily estimated with standard numerical integration techniques. \citeA{friel2008marginal} suggest the trapezoidal rule:

\begin{equation} p(\boldsymbol{D}) \approx \sum_{j=2}^{k} \frac{t_j-t_{j-1}}{2} \left[\frac{1}{n} \sum_{i=1}^{n}\ln p(\boldsymbol{D}|\boldsymbol{\theta}{i,j}) +\frac{1}{n} \sum{i=1}^{n}\ln p(\boldsymbol{D}|\boldsymbol{\theta}_{i,j-1}) \right]. \end{equation}

Steppingstone Estimator

The steppingstone estimator is also based on the power posteriors and is written as:

\begin{equation} p(\boldsymbol{D}) \approx \prod_{j=1}^{k} \frac{1}{n} \sum_{i=1}^{n} \frac{p(\boldsymbol{D} | \boldsymbol{\theta}{i,j-1})^{t{j}}}{p(\boldsymbol{D} | \boldsymbol{\theta}{i,j-1})^{t{j-1}}} \end{equation}

Obtaining the power posteriors and estimating the marginal likelihood

Now that we have an overview of the methods, let's use powder to define an LBA model, obtain the power posteriors, and compute the estimates for the marginal likelihood. First, we define a simple LBA model where no parameters vary across conditions. We also include a contaminant process where 2% of the trials are due to random contaminants, defined by a uniform distribution between 0 and 5 seconds.

library(powder)
model = LBA.Individual$new(contaminant = list(pct=2,upper.bound=5))

There are many other LBA models that can be defined, but for now we will stick with a simple one. Next, we load our data. This dataset is called "individual" and is a simulated dataset with a single subject from the LBA with two conditions in which no parameters were varied.

data('individual')

Next, we will run the powder function to obtain the posterior samples. This runs Differential Evolution MCMC (Turner et al. 2014) over num.temps power posteriors, each containing n.samples samples after burning in burnin samples. It runs each power posterior sequentially, using the last sample of each power posterior as the starting point for the new power posterior. When switching to the next power posterior, it takes time to adapt. We have set the adaptation time to meltin iterations.

pow.out = powder(data=individual, model = model, num.temps = 30, n.samples = 100, burnin=200, meltin = 50)

As described in Evans and Annis (submitted) we can increase the speed of the sampling process by sampling from power posteriors in parallel, where each chain in DE-MCMC samples from a different power posterior.

pow.out.par = powder(data=individual, model = model, num.temps = 30, n.samples = 800, burnin=200, method='parallel')

Lastly, we can estimate the marginal likelihood with:

summary(pow.out)

#                   Method         Value
# 1                     TI  5.618604e+02
# 2           TI Corrected  5.622093e+02
# 3          Harmonic Mean  5.757236e+02
# 4          Steppingstone 5.219464e+242
# 5      Log Steppingstone  5.621025e+02
# 6            TI Variance  2.653665e-03
# 7 Steppingstone Variance  2.263343e-02

summary(pow.out.par)

#                   Method         Value
# 1                     TI  5.628619e+02
# 2           TI Corrected  5.630752e+02
# 3          Harmonic Mean  5.739877e+02
# 4          Steppingstone 8.260357e+242
# 5      Log Steppingstone  5.628241e+02
# 6            TI Variance  3.445207e-03
# 7 Steppingstone Variance  4.691691e-02

The estimates from each method are similar to one another except for the harmonic mean and steppingstone estimator. The harmonic mean is generally considered a poor estimator of the marginal likelihood as it usually leads to an overestimate (Xie, et al. 2011). Note, the Steppingstone estimate will usually be very high or NaN because of overflow as it is estimating p(D) not ln(p(D)). These estimates are usually not stable, so one should use the log steppingstone estimate instead (Xie, et al. 2011).

References

Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57(3), 153–178. http://doi.org/10.1016/j.cogpsych.2007.12.002
Evans, N. J., & Brown, S. D. (2017). Bayes factors for the Linear Ballistic Accumulator Model of decision-making. Behavior Research Methods, Advance online publication.
Friel, N., & Pettitt, A. N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 70(3), 589–607. http://doi.org/10.1111/j.1467-9868.2007.00650.x
Friel, N., & Wyse, J. (2012). Estimating the evidence - A review. Statistica Neerlandica, 66(3), 288–308. http://doi.org/10.1111/j.1467-9574.2011.00515.x
Lartillot, N., & Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Systematic Biology, 55(2), 195–207. http://doi.org/10.1080/10635150500433722
Xie, W., Lewis, P. O., Fan, Y., Kuo, L., & Chen, M. H. (2011). Improving marginal likelihood estimation for bayesian phylogenetic model selection. Systematic Biology, 60(2), 150–160. http://doi.org/10.1093/sysbio/syq085