The following document provides example code for implementing nonparametric models based on shrinkage priors and Markov random fields (SPMRF) as described in Faulkner and Minin (2018) using the R
package spmrf
. You can simply copy and paste the code presented here to your R
terminal window or to a script file to run it. We use Hamiltonion Monte Carlo (HMC) to target the posterior distributions of the parameters. For this, we take advantage of the computing package Stan
, which offers a straight-forward and efficient interface for implementing HMC. Some preliminary setup is necessary to install the needed software, which we describe in what immediately follows.
All code presented here is for the R
computing environment. If you do not already have R
, then please visit the R
CRAN website and follow instructions for installation. The model fitting depends on the RStan
interface to the Stan
computing language. If you do not already have the R
package rstan
installed, then please follow the RStan Getting Started Guide before proceeding. If you are using a Windows platform, you may need to first install the Rtools
package if you do not already have it. If you are using a Mac, you need to make sure Xcode
is installed. Please make sure you have rstan
successfully installed before proceeding further.
Once you have rstan
installed, you just have to load the package using
library(rstan)
Next you need to install the devtools
package, which will provide functions to allow easy installation of the spmrf
package from GitHub. You can easily install devtools
using install.packages("devtools")
. Once devtools
is installed, you are ready to install the spmrf
package. This package is hosted on GitHub at http://github.com/jrfaulkner/spmrf, and can be installed using
intstall_github("jrfaulkner/spmrf", build_vignettes=TRUE)
Once you have spmrf
installed, attach it using
library(spmrf)
Now you are ready to start fitting models.
Here we use two examples of fitting models where data are simulated from known functions. The two functions were used in simulations presented in Faulkner and Minin (2018). The first function is a piecewise constant function and the second is a function with varying smoothness (also known as the "Mexican hat" function). These functions are best fit with models that allow adaptation to the sharp local features that the functions exhibit.
In Faulkner and Minin (2018) we compare three different configurations for the prior distributions on the order-k differences in latent function values (normal, Laplace, and horseshoe). The normal prior is the default used in Gaussian Markov random field (GRMF) models, and is not considered locally adaptive. The Laplace and horseshoe priors are types of shrinkage priors. These two formulations allow some locally-adaptive behavior, with the horseshoe being the most adaptive of the two. Given the results in Faulkner and Minin (2018), we recommend using the horseshoe prior as a default prior because it provides the better adaptive smoothing properties than the Laplace prior. More extreme shrinkage priors than the horseshoe have been developed that could provide better locally-adaptive smoothing, but we currently only have code available for the Laplace and horseshoe shrinkage priors. We encourage you to explore different prior formulations with your own modifications to our code.
The following code sets up the piecewise constant function and generates a set of independent normally distributed observations. For this example, the standard deviations of the observations is 4.5, which corresponds to a signal-to-noise ratio of 2. The generated plot shows the true function and the simulated data.
# Set up function nx <- 100 xv <- (1:nx)/nx mpc <- rep(25, nx) mpc[xv >= 0.2 & xv < 0.4] <- 10 mpc[xv >= 0.4 & xv < 0.6] <- 35 mpc[xv >= 0.6] <- 15 # Generate data set.seed(3) pc.sd <- 4.5 pc.norm <- rnorm(n=nx, mean=mpc, sd=pc.sd) # Create data list for passage to spmrf pcdat.norm <- list(J = nx, y = pc.norm) # Plot function and data plot(1:nx, pc.norm, xlab="t", ylab="y", main="Piecewise Constant Function") lines(1:nx, mpc, lwd=2, col="red") legend('topright', legend="Truth", lty=1, lwd=2, col="red", bty="n")
Now we are ready to fit some models using the spmrf
function. It is important to know that this function is a wrapper for the stan
function in the rstan
package. The spmrf
function has a few arguments for describing the model you want to run, but it also accepts most of the arguments of the stan
function (see spmrf
documentation for details). Given the overlap between the spmrf
and stan
functions, it would be beneficial to familiarize yourself with the stan
function and its inputs.
Some of the model parameters may not be of interest in posterior summaries, including those that the model code generates due to parameter transformations. We can tell the spmrf
function to give us output only for the parameters of interest. We set up the following character vectors to specify the parameters of interest for later summaries. We are really only interested in the theta (latent trend) parameters, but have included others for posterior diagnostic checks (see Faulkner and Minin (2017) for description of models and associated parameters).
# Parameters to keep # for model with normal prior pars.N <- c("theta", "gam", "sigma") # for model with Laplace prior pars.L <- c("theta", "tau", "gam", "sigma") # for model with horseshoe prior pars.H <- c("theta", "tau", "gam", "sigma")
The rstan
package is built on C++ code. Before running a model, the model code needs to be compiled, which can take 30-60 seconds or so for each model. If you need to run a model multiple times and all you need to change for each run is the input data, or the number of iterations, warmup (burn-in) period, thinning level or something else without changing the structure of the model itself, you can compile a model object once and then refer to that object in subsequent calls to spmrf
without re-compiling each time. Next we create initial model fit objects that can be called upon many times without re-compiling. This is not necessary in the examples we present here, but is shown so you are aware of the possibility.
# Compile model objects # -- GMRF ipcfit.N <- spmrf(prior="normal", likelihood="normal", order=1, zeta=0.01, data=pcdat.norm, chains=0) # -- Laplace ipcfit.L <- spmrf(prior="laplace", likelihood="normal", order=1, zeta=0.01, data=pcdat.norm, chains=0) # -- Horseshoe ipcfit.H <- spmrf( prior="horseshoe", likelihood="normal", order=1, zeta=0.01, data=pcdat.norm, chains=0)
Next we pass these compiled objects to spmrf
for the actual MCMC runs by specifying the compiled fit objects with the fit
argument. Before that we need to set up the MCMC run parameters common to each run. We run 4 chains for each model with each chain having a burn-in (warm-up) period of 500 draws and a thinning level of every 5 samples. The total number of posterior samples we want to keep here will be 2000, which means each chain requires 3000 total iterations including burn-in and before thinning. Depending on the configuration of your computer, the model with normal prior may take about 10-20 seconds per chain to run, while the other two models may take 60-120 seconds per chain, with the horsehoe model taking the longest. Note that we make some adjustments to some of the tuning parameters for the NUTS algorithm in stan
to allow better adaptation during warmup and subsequently better performance during sampling. Divergent transitions after warmup may still occur, but a small number of them is not a problem if other posterior diagnostics look acceptable. You can increase the adapt_delta
parameter to decrease the step size and potentially reduce the number of divergent transitions.
# MCMC run settings nchain <- 4 ntotsamp <- 2000 nthin <- 5 nburn <- 1000 niter <- (ntotsamp/nchain)*nthin + nburn # Model with normal prior pcfit.N <- spmrf(prior="normal", likelihood="normal", order=1, fit=ipcfit.N, data=pcdat.norm, par=pars.N, chains=nchain, warmup=nburn, thin=nthin, iter=niter, control=list(adapt_delta=0.96, max_treedepth=12)) # Model with Laplace prior pcfit.L <- spmrf(prior="laplace", likelihood="normal", order=1, fit=ipcfit.L, data=pcdat.norm, par=pars.L, chains=nchain, warmup=nburn, thin=nthin, iter=niter, control=list(adapt_delta=0.96, max_treedepth=12)) # Model with horseshoe prior pcfit.H <- spmrf( prior="horseshoe", likelihood="normal", order=1, fit=ipcfit.H, data=pcdat.norm, par=pars.H, chains=nchain, warmup=nburn, thin=nthin, iter=niter, control=list(adapt_delta=0.995, max_treedepth=12))
Now that the MCMC runs are complete, we want to extract the posterior samples from the model fit objects. We can do this in various ways using methods provided in the rstan
package. Specifically, these are extract
, as.array.stanfit
, as.matrix.stanfit
, and as.data.frame.stanfit
(Note: drop 'stanfit' from the function names when calling). The extract
function (as opposed to the others) allows you to keep the burn-in samples, but you have to be careful of the permuted=TRUE
default setting of extract
which permutes all of your samples -- not a good thing to do if you are trying to assess a trace plot or calculate autocorrelations. The as.array
function automatically drops the burn-in samples and puts your samples (not permuted) in a three-dimensional array, where the first dimension indexes the iterations, the second the chains, and the third the parameters. Next we use the as.array
function to extract the posterior samples. We also use the extract_theta
function from the spmrf
package to calculate the posterior median and 95% Bayesian credible intervals for the theta (latent trend) parameter vector.
# extract posterior samples pcout.N <- as.array(pcfit.N) pcout.L <- as.array(pcfit.L) pcout.H <- as.array(pcfit.H) # extract posterior median and 95% BCIs for theta theta.N <- extract_theta(pcfit.N, obstype="normal", alpha=0.05) theta.L <- extract_theta(pcfit.L, obstype="normal", alpha=0.05) theta.H <- extract_theta(pcfit.H, obstype="normal", alpha=0.05)
Here we provide some tools for assessing the mixing and convergence of the parameters. This can be a challenge because there are hundreds of parameters for each model. However, the task can be simplified some by focusing on the lower level parameters in the model hierarchy. These parameters may display convergence problems even when the higher level parameters appear to have good posterior sample properties. There are various tools in both the rstan
and spmrf
packages for assessing mixing and convergence, and you can also use functions in the coda
package to assess convergence. The rstan
function print.stanfit
(called with print
) will provide a table of summary statistics for each parameter including effective sample size estimates and the potential scale reduction factors (R_hat). The plot_chains
function in the spmrf
package allows you to plot the traces of the separate chains by parameter, with various options for transformations of parameters, colors of traces, and formatting of the plots. The following code produces plots for a few parameters from the horseshoe model fits, but the plots shown below are just for parameters 'theta[20]' and 'gam' (gamma).
# example trace plots for horseshoe model plot_trace(pcout.H, "theta[20]", pscale="original", stack=TRUE, colset="black") plot_trace(pcout.H, "tau[20]", pscale="log", stack=TRUE, colset="black") plot_trace(pcout.H, "gam", pscale="log", stack=TRUE, colset="black")
The plot_trend
function in the spmrf
package allows you to plot the posterior median and associated credible intervals for the theta parameters. There are various options for how the parameters are passed to the function, as well as options for colors and for plotting the data. The following is code used to create the embedded PNG file for the figure shown below.
``` {r eval=FALSE}
yrng <- c(0,45) png(filename='pc_plots.png', width=1500, height=420, res=200) par(mfrow=c(1,3), mar=c(2,1.5,1.5,1), oma=c(2,2,0,0)) plot_trend(theta=theta.N, obstype="normal", obsvar=pc.norm, xvar=1:nx, pt.cex=0.9, main="Normal", xlab="", ylab="", ylim=yrng) lines(1:nx, mpc, lwd=2, lty=2, col="red") plot_trend(theta=theta.L, obstype="normal", obsvar=pc.norm, xvar=1:nx, pt.cex=0.9, main="Laplace", xlab="", ylab="", ylim=yrng) lines(1:nx, mpc, lwd=2, lty=2, col="red") plot_trend(theta=theta.H, obstype="normal", obsvar=pc.norm, xvar=1:nx, pt.cex=0.9, main="Horseshoe", xlab="", ylab="", ylim=yrng) lines(1:nx, mpc, lwd=2, lty=2, col="red") legend(x="topright", legend=c("Truth", "Median", "95% BCI"), col=c("red", "blue","lightblue"), lty=c(2,1,1), lwd=c(2,2,3), bty="n", cex=0.8) mtext(side=1, outer=T, line=1, text="t", font=2, cex=0.8) mtext(side=2, outer=T, line=1, text="y", font=2, cex=0.8) dev.off()
<img src="figure/pc_plots.png" height=196 width=700 /> ### Varying Smooth Function We repeat the above steps for fitting the "Mexican hat" (varying smoothness) function. For this example, we fit second-order models using the three formulations for the priors on the latent trend parameters (theta). First create the trend function and generate data and plot. ```r # Set up generating function nx <- 100 xv <- (1:nx)/nx ygfun <- function(x){ sin(x) + 2*exp(-30*x^2) } gseq <- seq(-2,2,length=101) mvs1 <- 20 + 10*ygfun(gseq) mvs <- mvs1[-1] # Generate data set.seed(3) vs.sd <- 4.5 vs.norm <- rnorm(n=nx, mean=mvs, sd=vs.sd) vsdat.norm <- list(J = nx, y = vs.norm) # Plot function and data plot(1:nx, vs.norm, ylim=c(0,45), xlab="t", ylab="y", col="gray50", main="Varying Smooth Function") lines(1:nx, mvs, lty=2, lwd=2, col="red") legend('topright', legend="Truth", lty=2, lwd=2, col="red", bty="n")
Next we fit the models without compiling separately and we keep the same parameters as in the example for the piecewise constant function and extract the posterior draws and posterior summaries for theta. Note that the second-order models take longer to run, with the normal model taking about 30-60 seconds per chain, the Laplace model about 2-3 minutes per chain, and the horseshoe about 7-9 minutes per chain.
# Compile and run spmrf models for varying smooth (vs) funtion # -- GMRF vsfit.N <- spmrf(prior="normal", likelihood="normal", order=2, data=vsdat.norm, par=pars.N, chains=nchain, warmup=nburn, thin=nthin, iter=niter, control=list(adapt_delta=0.96, max_treedepth=12)) # -- Laplace vsfit.L <- spmrf(prior="laplace", likelihood="normal", order=2, data=vsdat.norm, par=pars.L, chains=nchain, warmup=nburn, thin=nthin, iter=niter, , control=list(adapt_delta=0.96, max_treedepth=12)) # -- Horseshoe vsfit.H <- spmrf(prior="horseshoe", likelihood="normal", order=2, data=vsdat.norm, par=pars.H, chains=nchain, warmup=nburn, thin=nthin, iter=niter, , control=list(adapt_delta=0.995, max_treedepth=12)) # extract posteriors vsout.N <- as.array(vsfit.N) vsout.L <- as.array(vsfit.L) vsout.H <- as.array(vsfit.H) # create posterior summaries for theta thvs.N <- extract_theta(vsfit.N, obstype="normal", alpha=0.05) thvs.L <- extract_theta(vsfit.L, obstype="normal", alpha=0.05) thvs.H <- extract_theta(vsfit.H, obstype="normal", alpha=0.05)
The following code produces summaries for all models and a few trace plots for the horseshoe model, this time in a few different formats.
# print parameter summaries print(vsout.N, pars=pars.N) print(vsout.L, pars=pars.L) print(vsout.H, pars=pars.H) # example trace plots for horseshoe model plot_trace(vsout.H, "theta[20]", pscale="original", stack=TRUE, colset="color") plot_trace(vsout.H, "tau[20]", pscale="log", stack=FALSE, colset="black") plot_trace(vsout.H, "gam", pscale="log", stack=FALSE, colset="gray") plot_trace(vsout.H, "sigma", pscale="log", stack=TRUE, colset="black")
Finally we plot the posterior medians and credible intervals for the theta (latent trend) parameters for the three models and compare them to the line for the true function. The code below is for producing the plot shown below, but you can just pull out the calls to plot_trend
for your own purposes.
yrng <- c(0,45) png(filename='vs_plots.png', width=1500, height=420, res=200) par(mfrow=c(1,3), mar=c(2,1.5,1.5,1), oma=c(2,2,0,0)) plot_trend(theta=thvs.N, obstype="normal", obsvar=vs.norm, xvar=1:nx, pt.cex=0.9, main="Normal", xlab="", ylab="", ylim=yrng) lines(1:nx, mvs, lwd=2, lty=2, col="red") legend(x="topleft", legend=c("Truth", "Median", "95% BCI"), col=c("red", "blue","lightblue"), lty=c(2,1,1), lwd=c(2,2,3), bty="n", cex=0.8) plot_trend(theta=thvs.L, obstype="normal", obsvar=vs.norm, xvar=1:nx, pt.cex=0.9, main="Laplace", xlab="", ylab="", ylim=yrng) lines(1:nx, mvs, lwd=2, lty=2, col="red") plot_trend(theta=thvs.H, obstype="normal", obsvar=vs.norm, xvar=1:nx, pt.cex=0.9, main="Horseshoe", xlab="", ylab="", ylim=yrng) lines(1:nx, mvs, lwd=2, lty=2, col="red") legend(x="topleft", legend=c("Truth", "Median", "95% BCI"), col=c("red", "blue","lightblue"), lty=c(2,1,1), lwd=c(2,2,3), bty="n", cex=0.8) mtext(side=1, outer=T, line=1, text="t", font=2, cex=0.8) mtext(side=2, outer=T, line=1, text="y", font=2, cex=0.8) dev.off()
Some of these models can be quite time consuming to fit. If you have access to a computer with multi-core processors, you can greatly reduce the fitting time by parallelizing the processes. The following example uses the mclapply
function from the parallel
package to distribute posterior sampling of different chains to different processor cores. Make sure you have multi-core processors on your computer before running this.
# Sampling setting nchain <- 4 ntotsamp <- 2000 nthin <- 5 nburn <- 1000 niter <- (ntotsamp/nchain)*nthin + nburn chnlist <- 1:4 # Compile fit object ivsfit.H <- spmrf(prior="horseshoe", likelihood="normal", order=2, data=vsdat.norm, chains=0, par=pars.H) # Run model separately per chain -- one per core tmp.sflist.H <- mclapply(1:nchain, mc.cores = nchain, function(xx) spmrf(prior="horseshoe", likelihood="normal", order=2, fit=ivsfit.H, data=vsdat.norm, par=pars.H, chains=1, warmup=nburn, thin=nthin, iter=niter, chain_id=chnlist[xx], refresh=-1, control=list(adapt_delta=0.99, max_treedepth=12)) ) # Convert list to stanfit object vsfit.H <- sflist2stanfit(tmp.sflist.H) # Extract posteriors into array vsout.H <- as.array(vsfit.H)
Faulkner, J.R., and V.N. Minin. 2018. Locally adaptive smoothing with Markov random fields and shrinkage priors. Bayesian Analysis 13(1):225-252.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.