knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "png", dev.args = list(type = "cairo-png"), fig.width = 7, fig.height = 5 )
options(width = 100) # sets width of output window
In this vignette we are going to see how to fit an SPDE to one-dimensional point data, i.e. data that consist of the points at which things are located, not the number of points in some area.
Load libraries
library(inlabru) library(INLA) library(mgcv) library(ggplot2) library(fmesher)
data(Poisson2_1D, package = "inlabru")
Take a look at the point (and frequency) data
ggplot(pts2) + geom_histogram(aes(x = x), binwidth = 55 / 20, boundary = 0, fill = NA, color = "black" ) + geom_point(aes(x), y = 0, pch = "|", cex = 4) + coord_fixed(ratio = 1)
Build a 1D mesh:
x <- seq(0, 55, length.out = 50) mesh1D <- fm_mesh_1d(x, boundary = "free")
Make the latent components for a 1D SPDE model, using an integrate-to-zero constraint for better identifiability:
matern <- inla.spde2.pcmatern(mesh1D, prior.range = c(150, 0.75), prior.sigma = c(0.1, 0.75), constr = TRUE ) comp <- ~ spde1D(x, model = matern) + Intercept(1)
Here we want to fit to the actual points, and the inlabru
functions that are
used for this are bru()
and like(..., family = "cp")
.
For this special case there is also a shortcut function lgcp()
(for 'Log Gaussian Cox Process'),
but it doesn't support all features. The standard way of specifying
the function space for integration is via the domain
argument.
fit.spde <- bru( comp, like(x ~ ., family = "cp", data = pts2, domain = list(x = mesh1D)) ) ## Equivalent call for this particular example: # fit.spde <- lgcp(comp, formula = x ~ ., data = pts2, domain = list(x = mesh1D))
Here, formula = x ~ .
means that the observed points are in x
, and .
denotes
a linear predictor that is the sum of all the latent components.
We can look at the posterior distributions of the parameters of the SPDE using the function spde.posterior
.
It returns x
and y
values for a plot of the posterior PDF in a data frame, which can be printed using
the plot
function. To see the PDF for the range parameter, for example:
post.range <- spde.posterior(fit.spde, name = "spde1D", what = "range") plot(post.range)
Look at the help file for spde.posterior
and then plot the posterior for the log of the SPDE range parameter, the
SPDE variance and/or log of the variance, and for the Matern covariance function. Make sure you understand the difference
between what is plotted for the range and variance parameters, and for the covariance function
(which involves both these parameters).
post.log.range <- spde.posterior(fit.spde, name = "spde1D", what = "log.range") plot(post.log.range) # SOLUTION post.variance <- spde.posterior(fit.spde, name = "spde1D", what = "variance") plot(post.variance) # SOLUTION post.log.variance <- spde.posterior(fit.spde, name = "spde1D", what = "log.variance") plot(post.log.variance) # SOLUTION post.matcorr <- spde.posterior(fit.spde, name = "spde1D", what = "matern.correlation") plot(post.matcorr) # SOLUTION
You can get a feel for sensitivity to priors by specifying different priors and looking at the posterior plots.
We can also now predict on any scale we want. For example, to predict on the 'response'
scale (i.e. the intensity function $\lambda(s)$), we call predict
thus:
predf <- data.frame(x = seq(0, 55, by = 1)) # Set up a data frame of explanatory values at which to predict pred_spde <- predict(fit.spde, predf, ~ exp(spde1D + Intercept), n.samples = 1000)
while to predict on the linear predictor scale (i.e. that of the log intensity,
$\log(\lambda(s))$), we call predict
thus:
pred_spde_lp <- predict(fit.spde, predf, ~ spde1D + Intercept, n.samples = 1000)
here's how to plot the prediction and 95% credible interval:
plot(pred_spde, color = "red") + geom_point(data = pts2, aes(x = x), y = 0, pch = "|", cex = 2) + xlab("x") + ylab("Intensity")
How does this compare with the underlying intensity function that generated the data?
The function lambda2_1D( )
in the dataset Poission2_1D
calculates the true intensity
that was used in simulating these data. In order to plot this, we make a data frame with
x
- and y
-coordinates giving the true intensity function, $\lambda(s)$. We use lots
of x
-values to get a nice smooth plot (150 values).
xs <- seq(0, 55, length = 150) true.lambda <- data.frame(x = xs, y = lambda2_1D(xs))
Plot the fitted and true intensity functions:
plot(pred_spde, color = "red") + geom_point(data = pts2, aes(x = x), y = 0, pch = "|", cex = 2) + geom_line(data = true.lambda, aes(x, y)) + xlab("x") + ylab("Intensity")
We can look at the goodness-of-fit of the mode using the inlabru
function bincount( )
,
which plots the 95% credible intervals in a specified set of bins along the x
-axis
together with the observed count in each bin:
The credible intervals are shown as red rectangles, the mean fitted value as a short
horizontal blue line, and the observed data as black points:
bc <- bincount( result = fit.spde, observations = pts2, breaks = seq(0, max(pts2), length = 12), predictor = x ~ exp(spde1D + Intercept) ) attributes(bc)$ggp
Abundance is the integral of the intensity over space. We estimate it by integrating
the predicted intensity over x
. Integration is done by adding up the intensity at
locations x
weighted by a particular weight. The locations x
and their weights
are constructed using the fm_int
function
ips <- fm_int(mesh1D, name = "x") head(ips) Lambda <- predict(fit.spde, ips, ~ sum(weight * exp(spde1D + Intercept)))
You can look at the abundance estimate by typing
Lambda
mean
is the posterior mean abundance.sd
is the estimated standard error of the posterior of the abundance.cv
is its estimated coefficient of variation (stander error divided by mean).q0.025
and q0.975
are the 95% credible interval bounds.q0.5
is the posterior median abundanceBut it is not quite that simple! The above posterior for abundance takes account
only of the variance due to us not knowing the parameters of the intensity function.
It neglects the variance in the number of point locations, given the intensity function.
To include this we need to modify predict( )
as follows:
Nest <- predict( fit.spde, ips, ~ data.frame( N = 50:250, dpois = dpois(50:250, lambda = sum(weight * exp(spde1D + Intercept)) ) ) )
This calculates the same statistics as were calculated for Lambda
, but for evey value of N
from 50 to 250,
rather than for the posterior mean N
alone:
Nest[Nest$N %in% 100:105, ]
We compute the 95% prediction interval and the median as follows
inla.qmarginal(c(0.025, 0.5, 0.975), marginal = list(x = Nest$N, y = Nest$mean))
Compare Lambda
to Nest
by plotting:
First calculate the posterior conditional on the mean of Lambda
Nest$plugin_estimate <- dpois(Nest$N, lambda = Lambda$mean)
Then plot it and the unconditional posterior
ggplot(data = Nest) + geom_point(aes(x = N, y = mean, colour = "Posterior")) + geom_line(aes(x = N, y = mean, colour = "Posterior")) + geom_ribbon( aes( x = N, ymin = mean - 2 * mean.mc_std_err, ymax = mean + 2 * mean.mc_std_err ), fill = "grey", alpha = 0.5 ) + geom_point(aes(x = N, y = plugin_estimate, colour = "Plugin")) + geom_line(aes(x = N, y = plugin_estimate, colour = "Plugin")) + ylab("pmf")
Do the differences make sense to you?
Now refit a GAM for the count data of Poisson2_1D
and plot the
estimated intensity function from this GAM fit, together with the LGCP fitted above
and the true intensity.
cd2 <- countdata2 fit2.gam <- gam(count ~ s(x, k = 10) + offset(log(exposure)), family = poisson(), data = cd2) dat4pred <- data.frame(x = seq(0, 55, length = 100), exposure = rep(cd2$exposure[1], 100)) pred2.gam <- predict(fit2.gam, newdata = dat4pred, type = "response") dat4pred2 <- cbind(dat4pred, gam = pred2.gam) # SOLUTION
You should get a plot like this (thick line is the true intensity, the thin solid line the inlabru fit, the dashed line the GAM fit:
plot(pred_spde) + geom_point(data = pts2, aes(x = x), y = 0, pch = "|", cex = 2) + geom_line(data = dat4pred2, aes(x, gam / exposure, colour = "gam()"), lty = 2) + geom_line(data = true.lambda, aes(x, y, colour = "True"), lwd = 1.5) + geom_point(data = cd2, aes(x, y = count / exposure)) + ylab("Intensity") + xlab("x") # SOLUTION
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.