In jacobcvt12/bm1_project: What the package does (short line)

# load package
devtools::load_all(quiet=TRUE)

# set up simulation
simulateZ <- function(N, p){
    P <- rdirichlet(N, p)
    cumP <- t(apply(P, 1, cumsum))
    u <- runif(N)
    zz <- rep(NA, N)
    zz[u < cumP[, 1]] <- 1
    k <- 2
    while(k <= ncol(P)){
        zz[u < cumP[, k] & u >= cumP[, k-1]] <- k
        k <- k+1
    }
    zz
}
library(gtools)
p <- c(0.3, 0.7)
theta <- c(0, 2)
sigma <- c(1, 3)
z <- simulateZ(100, p)
y <- rnorm(100, theta[z], sigma[z])

# theta priors
mu.0 <- 0
tau.20 <- 5

# s priors
sigma.20 <- 1
v.0 <- 5

# p priors
a <- 2
b <- 1

# number of iterations
S <- as.integer(1e5)
B <- as.integer(1e4)

# explore joint posterior
set.seed(42)
time <- system.time({
    PHI <- sampler(y, mu.0, tau.20, sigma.20, v.0, a, b, S, B)
})[1]

Introduction

Based on data simulated from a mixture of normal distributions we investigated the performance of a simplified Reservable Jump Markov Chain Monte Carlo (RJ MCMC) posterior approximation. We evaluated the accuracy and efficiency of this technique by comparing it's output to the known parameters and analyzing common diagnostics used in MCMC. Based on our chain of r S iterations we can conclude that some parameters, such as the selection of a single normal or mixture of normals model and the binomial $p$ value associated with the mixture of normals model, are accurately estimated. However other parameters such as the means and precisions of the component normal distributions in the normal mixture model are not efficiently approximated given the computational effort used.

Data Simulation

Simulated data was generated using a two component additive mixture of normal distributions. A vector of one hundred independent and identically distributed binary values( e.g $0$ or $1$) was generated from a binomial distribution with a one-third probability of assigning a particular replicate to the lower of two normal distributions. The first component component normal distribution had parametric mean and precision values of $0$ and $1$ respectively. The second component normal distribution had a parametric mean and precision of $2$ and $\frac{1}{9}$ respectively. These different normal replicates were then concatenated and used as data in the simplified RJ MCMC analysis.

Prior Distributions

Both the single normal model and normal mixture model were given equal prior probability. Additionally, in the normal mixture model the prior probability of each normal distribution’s contribution to the observable variable was uniformly distributed (e.g. taken from a beta distribution with shape and rate parameters equal to one). This resulted in a $p$ value for the probability of the lower normal component and a complementary $1-p$ probability of the higher normal component. The prior distributions for the parameters used in the additive normal components and single normal model ($\theta_1$, $\theta_2$ , $s_1$, $s_2$ ) were taken from standard conjugate distributions. The $\theta$ parameters were generated from a normal distribution with a population mean hyperparameter $\mu_0$ of $0$ for both normal components, and the single normal model. The variance hyperparameter $\tau_0^2$ for both normal component means was set to $5$ for both components, and the single normal model. The prior distributions for the standard deviations $s$ of the normal components were back calculated after setting the precision values ($\frac{1}{s^2}$) a gamma distribution with shape and rate parameters $v/2$, $\frac{v\sigma_0^2}{2}$ respectively. The prior standard deviation for the first and second component and single normal model was $1$, with the degrees of freedom hyperparameter $v$ set $5$ for all normal distributions.

Simplified RJ MCMC specifics

The algorithm for a simplified RJ MCMC sampler was sourced as an R function relying on compiled C++ code to perform all computations. This was accomplished through the use of the R package rcpp, and the C++11 standard and Boost. Each iteration of the sampler could select from a single normal model ($k=1$) or a mixture of two normal distributions ($k=2$), and therefore approximated the trans-dimensional estimation capabilities of a full RJ MCMC algorithm. However this simplification severely limited the number of extra parameters that could be proposed.

The sampling algorithm was called as a function in R with the arguments for the data and prior distributions specified as mentioned above. The sampler was run for r S (in only r round(time, 2) seconds!) iterations and the resulting chain was visually inspected for convergence to stationarity. For the first r B iterations an adaptive updating scheme was used where the the delta variables assigning each data replicate to a particular component distribution was altered if in the previous $100$ iterations the proposal state rejection rate was not between $20 - 30$ percent. The first r B iterations were then removed from as burn in.

At each iteration one value of one of the previous states of the parameters $\theta_1, \theta_2, s_1, s_2, p, k, x_1, \ldots x_n$ were possibly updated by stochastically generating proposal states and accepting these states with a probability proportional to the likelihood ratio of the the proposal state compared with the previous state. The first five parameters listed above were updated by by resampling from a normal proposal distribution. The categorical $k$ parameter was updated at each step using a uniform probability ($0.5$) of proposing either the single normal model or mixture of normals model at each iteration. In the condition that the current value of $k = 1$ (e.g. estimation of the single normal distribution model) the values of $\theta_2, s_2$ and the $x_1, \ldots x_n$ were set to 0.

Supplement

library(coda)
phi1<-PHI[PHI[,106]==1,]
phi2<-PHI[PHI[,106]==2,]

# k=1

plot(mcmc(phi1[,1]), main="theta")
plot(mcmc(phi1[,3]), main="precision")

theta1k1acf<-acf(phi1[,1], plot=FALSE)
plot(theta1k1acf, main=expression(paste("Autocorrelation plot for ", theta[1],". ", k==1, " model")))

prec1k1acf<-acf(phi1[,3], plot=FALSE)
plot(prec1k1acf,main=expression(paste("Autocorrelation plot for ", 1/sigma[1],". ", k==1, " model")))

# k=2
plot(mcmc(phi2[,1]), main="theta1")
plot(mcmc(phi2[,2]), main="theta2")
plot(mcmc(phi2[,3]), main="precision1")
plot(mcmc(phi2[,4]), main="precision2")
plot(mcmc(phi2[,5]), main="p")

theta1k2acf<-acf(phi2[,1], plot=FALSE)
plot(theta1k2acf,main=expression(paste("Autocorrelation plot for ", theta[1],". ", k==2, " model")))

theta2k2acf<-acf(phi2[,2], plot=FALSE)
plot(theta2k2acf,main=expression(paste("Autocorrelation plot for ", theta[2],". ", k==2, " model")))

prec1k2acf<-acf(phi2[,3], plot=FALSE)
plot(prec1k2acf,main=expression(paste("Autocorrelation plot for ", 1/sigma[1],". ", k==2, " model")))

prec2k2acf<-acf(phi2[,4], plot=FALSE)
plot(prec2k2acf,main=expression(paste("Autocorrelation plot for ", 1/sigma[2],". ", k==2, " model")))

pk2acf<-acf(phi2[,5], plot=FALSE)
plot(pk2acf,main=expression(paste("Autocorrelation plot for binomial ", p,". ", k==2, " model")))

ess1 <- effectiveSize(phi1[, c(1, 3)])
ess2 <- effectiveSize(phi2[, 1:5])
min.ess <- as.integer(min(c(ess1, ess2)))
max.ess <- as.integer(max(c(ess1, ess2)))

k.2 <- round(mean(PHI[, 106]==2), 2)
k.1 <- round(mean(PHI[, 106]==1), 2)

Posterior summary and diagnostics

There was one chain of MCMC output resulting from the sampler described above. This output was analyzed as a matrix with $106$ columns representing the iterated posterior parameter estimates, and a number of rows equal to the number of iterations the sampler was run with burn in removed (e.g. $99\times10^3$). The effective sample sizes for the output parameter estimates ranged from r min.ess to r max.ess, and were generally lower for the precision parameters of the more complex model.

The resulting posterior marginal probability for the accurate model (mixture of normals) was r k.2 about twice as high as the probability of alternative single normal model (r k.1). The MCMC diagnostics computed for the output chain however show that the parameter updates of the simpler single normal distribution model had lower and more steeply decreasing lag autocorrelation values. The slower mixing of the normal mixture model is also apparent in the shapes of the posterior marginal distribution of its parameters. For example the histograms of both precision parameters and the lower normal distribution's mean show heavy tails and are probably poor approximations of the posterior distributions of these parameters.

Comparison of model to simulation parameters and conclusions

While the posterior distribution generated by the sampler does correctly place more posterior probability on the normal mixture model, the point estimates of the parameter values in the normal mixture model have varying degrees of accuracy when compared to the values used in the data simulation. The estimation of the smaller distribution's precision parameter was particularly inaccurate and the $95$ percent highest posterior density interval (HPD) generated from the posterior sample does not contain the true value used in the data simulation. The other parameter's HPD intervals contain their corresponding accurate values, but point estimates of the means of the two component normal distributions are not close to their true values.Therefore if these location parameters are the primary concern of future analyses, it is recommended that one million or more iterations of the sampler should be computed. The point estimates of the binomial $p$ value and the higher normal distribution's precision parameter are fairly close to their accurate values even with just one hundred thousand iterations.