In kcucchi/exPrior: Prior Distributions Using a Bayesian Hierarchical Framework

knitr::opts_chunk$set(fig.width = 7, fig.height = 5)

In most cases, the data used in assimilation are spatially correlated, since measurements are usually collected in a clustered way. The data assimilation model of exPrior can in principle account for patterns of spatial variability by using multivariate distributions as site-specific distributions. First, we should load the libraries to get access to this function:

library(devtools)
load_all()
library(exPrior)
library(gstat)

Using this model, we will use spatially correlated, synthetic data from 3 different sites. Other that that, we will follow the set up of of the vignette "Using exPrior by virtue of a simple example". To assure the spatial correlation in the data, we will use data that was generated with a statistical algorithm from the gstat package

set.seed(1)
xy <- data.frame("x" = sample(seq(0.00,1.00,0.01),22),
                   "y" = sample(seq(0.00,1.00,0.01),22))
model = vgm(psill=1, range=1, model='Exp')
g.dummy <- gstat(formula=z~1, locations=~x+y, dummy=TRUE, beta=1, model=model, nmax=20)
exdata_spatial <- predict(g.dummy, newdata=xy, nsim=1)

colnames(exdata_spatial)[3] <- "val"
exdata_spatial$site_id = c(rep("S1", 10), rep("S2", 5), rep("S3", 7))
exdata_spatial[ 1:10, 'val'] <- exdata_spatial[ 1:10, 'val'] - 3
exdata_spatial[11:15, 'val'] <- exdata_spatial[11:15, 'val'] - 2.5
exdata_spatial[16:22, 'val'] <- exdata_spatial[16:22, 'val'] - 3.5

The resulting dataframe then looks like this

exdata_spatial

Now that we have generated our dataframe exdata_spatial, containing the data together with the spatial coordinates, we can start inferring the prior distribution. Let us start by running genExPrior with these data but without accounting for the spatial correlation and look at the uninformative and informative distribution of $\theta$ using plotExPrior.

theta <- seq(from=-10, to=10, by=0.1)
resExPrior = genExPrior(exdata=exdata_spatial, theta=theta)
plotExPrior(resExPrior, plotExData=TRUE)

As can be seen, the results are along the lines of the example in the vignette "Using exPrior by virtue of a simple example". Now let us redo the inference but accounting for the spatial correlations in the data and visualize the results. This is done by setting the flag spatialCoordinates to TRUE.

resExPrior_spatial = genExPrior(exdata=exdata_spatial, theta=theta, spatialCoordinates=TRUE)
plotExPrior(resExPrior_spatial, plotExData=TRUE)

If we compare both the prior distributions, one without accounting for spatial correlation and one with accounting for it, we see overall similar results. The main difference is that the latter shows a somewhat increased uncertainty, i.e., wider variance. The fact that the more realistic model produces more uncertain results may seem surprising at first. However, the aim of Bayesian inference is not to reduce the uncertainty as much as possible but to correctly represent the uncertainty in the used data and the model.