Using exPrior by virtue of a simple example

knitr::opts_chunk$set(fig.width = 7, fig.height = 5)

When performing hydro-geological investigation at a given site, measurements at that site $y^*$ are often used to characterize the parameter $\theta$ of a given model. By using the information from the previously studied sites, we wish to derive a prior distribution for a parameter $\theta$. The exPrior package allows a user to compute informative prior distribution(s) for a parameter $\theta$ using such external data.

For simplicity, this vignette only demonstrates the package's application in a three level hierarchical Bayesian model, which is:

With this set up, consider an example where data are available at 3 sites:

Using these data, we want to get the prior distribution of $\theta$. We'll solve this task by using the function genExPrior. This function will perform two steps:

  1. Given data from a number of sites, the function calculates the posterior distribution $p(\eta|y)$ for the hyperparameters $\eta$ of our statistical model. A Markov Chain Monte-Carlo (MCMC) method is employed to compute these posterior distributions.

  2. From these posterior hyperparameter distribution, this package derives the prior of $\theta$ at new site $S_0$, i.e. $p(\theta|y)$.

First, we should load the libraries to get access to this function:

library(devtools)
load_all()
library(exPrior)

Under the assumption that the site specific parameter follows a normal distribution, the function genExPrior takes in three parameters. First, exdata is a data frame where the first column contains the data and the second column is a site index where the data come from. Second, $\theta$ is a vector of numerical values where to evaluate the prior distribution. Finally, niter is an integer for the sample size in the MCMC that is used to evaluate unknown $\mu_i, \sigma^2_j$ at each site i ( i = 1, 2, 3 in our case). By default it is set to $10^5$, which is an effective sample size for MCMC. Users are free to choose a different sample size.

Putting the data of the three sites into dataframe, we have:

exdata = data.frame(val = c(c(-2,-3,-4), c(-2,-1), c(-6,-7,-2,-3)), 
                    site_id = c(rep('S1',3), rep('S2',2), rep('S3',4))) 
exdata
theta = seq(from=-10, to=10, by=0.1)

Running genExPrior with these arguments, we get the prior distribution for $\theta$ as well as the posterior hyperparameters of our Bayesian hierarchical model.

  resExPrior = genExPrior(exdata = exdata, theta = theta)

If the distribution of the parameter is not normal, genExPrior provides an option to transform the distribution to normal under user's choices. Two types of Johnson transformation, logarithm and log ratio, as well as Box-Cox transformation are provided. Lower and upper limit of log ratio, and value of $\lambda$ for Box-Cox transformation should be chosen so that the transformed data has normal distribution.

First, let us look at the posteriors of the hyperparameters, which are conditioned on the data in the exdata data frame. To that end, we use the function plotHyperDist with the results from genExPrior as input

    plotHyperDist(resExPrior)

Then, we can visualize both the uninformative and informative distribution of $\theta$ using plotExPrior. This function again takes as input the output from genExPrior as well as a Boolean asking whether to additionally plot the used data.

    plotExPrior(resExPrior, plotExData = TRUE)


Try the exPrior package in your browser

Any scripts or data that you put into this service are public.

exPrior documentation built on Nov. 15, 2019, 1:07 a.m.