knitr::opts_chunk$set(fig=TRUE, warning=FALSE, message=FALSE, eval=TRUE, cache=FALSE, comment = '#>', collapse=TRUE, dev='png')
This vignette describes how the user may specify their own choices or choices driven by a variogram for the range parameter. This may be necessary on occasions where the data is very patchy.
::: callout-important Note that this page shows an example of the use of the variogram method for choosing a sequence of $r$ parameters. No effort has been made to model this data in the best way possible. :::
The data we shall use for this example is from a Danish offshore windfarm and is part of the MRSea
package. The data are counts of birds collected along transects over a number of surveys and years. In this first example, we will use all of the data together and assess if there is a relationship between number of birds and sea depth.
library(dplyr) library(MRSea) library(ggplot2)
# load the data data("nysted.analysisdata") wfdata <- filter(nysted.analysisdata, impact==0, season==1) # load the prediction grid data("nysted.predictdata") preddata <- filter(nysted.predictdata, impact==0, season==1)
ggplot(preddata) + geom_tile(aes(x=x.pos, y=y.pos, fill=truth.re, height=sqrt(area), width=sqrt(area))) + scale_fill_distiller(palette = "Spectral",name="No. Birds") + xlab("Easting (km)") + ylab("Northing (km)") + theme_bw() + ggtitle("The true surface")
ggplot(wfdata) + geom_tile(aes(x=x.pos, y=y.pos, fill=response, height=sqrt(area), width=sqrt(area))) + scale_fill_distiller(palette = "Spectral",name="No. Birds") + xlab("Easting (km)") + ylab("Northing (km)") + theme_bw() + ggtitle("Simulated Data")
Set up the initial model with the offset term (if required) and specify the parameters required. Here we add an offset to be the size of the segment associated with the bird counts. In reality, our bird counts are over a particular area so we have counts per unit area. The initial model contains the offset information, and specifies the family of model.
initialModel <- glm(response ~ 1 + offset(log(area)), family = "quasipoisson", data = wfdata)
1) Create the knot grid. Here we select 300 space-filled locations from our data.
set.seed(123) knotgrid<- getKnotgrid(coordData = cbind(wfdata$x.pos, wfdata$y.pos), numKnots = 300, plot = FALSE)
2) Calculate the distances between the knot points and all data points and also the distances between all the knots.
distMats <- makeDists(cbind(wfdata$x.pos, wfdata$y.pos), knotgrid)
3) Next we specify some additional parameters needed to run the SALSA2D algorithm:
r_seq
which is a sequence of range/radii values determining the influence of each basis function. This is the $r$ parameter in the basis function.The model initialises with the central value of this $r$ sequence. If r_seq
is not specified then the model will choose the range parameters using the existing getRadiiChoices()
function. Using this function with 10 choices selected and specifying it in salsa2dlist
is the equivalent of not specifying it at all and letting runSALSA2D
calculate them.
(rs.orig <- getRadiiSequence(method = "original", numberofradii = 10, distMatrix = distMats$dataDist, basis = "gaussian"))
# make parameter set for running salsa2d salsa2dlist<-list(fitnessMeasure = 'QBIC', knotgrid = knotgrid, startKnots = 10, minKnots = 4, maxKnots = 15, gap = 0, r_seq = rs.orig) ##
1) Fit a variogram to the location and response data using the gstat
package for R
[@pebesma2004].
2) Use a spherical model to find the range parameter (the distance where response values are considered no longer correlated)
3) Use this as the central value for the sequence (the value the gamMRSea
model will initialise on)
4) Create a sequence of values to allow for more local and more global bases. This is based on the lags chosen by the function gstat::variogram
.
rs <- getRadiiSequence(method = "variogram", numberofradii = 10, xydata = wfdata[, c("x.pos", "y.pos")], response = log(wfdata$NHAT + 1), basis = "gaussian", distMatrix = distMats$dataDist) rs
The sequence of $r$'s in the rs
object also contains a table as part of the attributes. This shows the variogram model fit and the range parameter used as the central point of the sequence. In this case, the range parameter is r round(attributes(rs)$vg.fit[2,3],2)
which suggests on average, the spatial correlation decays after approximately r round(attributes(rs)$vg.fit[2,3])
km.
Visualising the bases:
To have a look at the two methods (the default vs the variogram) we can look at how the min, max and middle bases of the sequence appear. Note that the variogram method will always have an odd length sequence owing to the sequence being based on the middle value.
par(mfrow=c(3,2)) b1 <-LRF.g(radiusIndices = 1, dists = distMats$dataDist, radii = rs, aR = 149) fields::quilt.plot(wfdata$x.pos, wfdata$y.pos, b1, zlim=c(0,1), main="r1 variogram") b1.orig <-LRF.g(radiusIndices = 1, dists = distMats$dataDist, radii = rs.orig, aR = 149) fields::quilt.plot(wfdata$x.pos, wfdata$y.pos, b1.orig, zlim=c(0,1), main="r1 original") b5 <-LRF.g(radiusIndices = 5, dists = distMats$dataDist, radii = rs, aR = 149) fields::quilt.plot(wfdata$x.pos, wfdata$y.pos, b5, zlim=c(0,1), main="r5 variogram") b5.orig <-LRF.g(radiusIndices = 5, dists = distMats$dataDist, radii = rs.orig, aR = 149) fields::quilt.plot(wfdata$x.pos, wfdata$y.pos, b5.orig, zlim=c(0,1), main="r5 original") b9 <-LRF.g(radiusIndices = 9, dists = distMats$dataDist, radii = rs, aR = 149) fields::quilt.plot(wfdata$x.pos, wfdata$y.pos, b9, zlim=c(0,1), main="r9 variogram") b10.orig <-LRF.g(radiusIndices = 10, dists = distMats$dataDist, radii = rs.orig, aR = 149) fields::quilt.plot(wfdata$x.pos, wfdata$y.pos, b10.orig, zlim=c(0,1), main="r10 original")
There is not a huge difference with the original here but the variogram method suggests slightly smaller bases in general.
::: callout-important Note that if the variogram method selects a small range compared with the size of the distances available in your surface then the model may support many more knots. In this case you might consider starting with a larger number. :::
Let's have a look at the modelling differences. First the original parametrisation.
# make parameter set for running salsa2d salsa2dlist<-list(fitnessMeasure = 'QBIC', knotgrid = knotgrid, startKnots = 10, minKnots = 4, maxKnots = 15, gap = 0, r_seq = rs.orig) ##
salsa2dOutput.origr <- runSALSA2D(model = initialModel, salsa2dlist = salsa2dlist, d2k=distMats$dataDist, k2k=distMats$knotDist, suppress.printout = TRUE) salsa2dOutput.origr <- salsa2dOutput.origr$bestModel
salsa2dlist$r_seq <- rs salsa2dOutput.vario <-runSALSA2D(model = initialModel, salsa2dlist = salsa2dlist, d2k=distMats$dataDist, k2k=distMats$knotDist, suppress.printout = TRUE) salsa2dOutput.vario <- salsa2dOutput.vario$bestModel
# radius indices salsa2dOutput.origr$splineParams[[1]]$radiusIndices salsa2dOutput.vario$splineParams[[1]]$radiusIndices
# CV scores cv.gamMRSea(wfdata, salsa2dOutput.origr, K=10, s.eed=154)$delta[2] cv.gamMRSea(wfdata, salsa2dOutput.vario, K=10, s.eed=154)$delta[2]
The variogram method
preddist<-makeDists(cbind(preddata$x.pos, preddata$y.pos), knotgrid, knotmat=FALSE)$dataDist # make predictions on response scale preds.orig<-predict(newdata = preddata, g2k = preddist, object = salsa2dOutput.origr) preds.vario<-predict(newdata = preddata, g2k = preddist, object = salsa2dOutput.vario)
a <- ggplot(preddata) + geom_tile(aes(x=x.pos, y=y.pos, fill=preds.orig, height=sqrt(area), width=sqrt(area))) + scale_fill_distiller(palette = "Spectral",name="No. Birds") + xlab("Easting (km)") + ylab("Northing (km)") + theme_bw() + coord_equal() + geom_point(data=data.frame(knotgrid[salsa2dOutput.origr$splineParams[[1]]$knotPos,]), aes(X1, X2))+ ggtitle("Default r") b <- ggplot(preddata) + geom_tile(aes(x=x.pos, y=y.pos, fill=preds.vario, height=sqrt(area), width=sqrt(area))) + scale_fill_distiller(palette = "Spectral",name="No. Birds") + xlab("Easting (km)") + ylab("Northing (km)") + theme_bw() + coord_equal()+ geom_point(data=data.frame(knotgrid[salsa2dOutput.vario$splineParams[[1]]$knotPos,]), aes(X1, X2)) + ggtitle("Variogram based r") ggpubr::ggarrange(a,b, common.legend = TRUE, ncol = 1, legend = "right")
c <- ggplot(preddata) + geom_tile(aes(x=x.pos, y=y.pos, fill=(preds.orig - truth.re), height=sqrt(area), width=sqrt(area))) + scale_fill_distiller(palette = "Spectral",name="No. Birds") + xlab("Easting (km)") + ylab("Northing (km)") + theme_bw() + coord_equal() + ggtitle("Default r") d <- ggplot(preddata) + geom_tile(aes(x=x.pos, y=y.pos, fill=(preds.vario-truth.re), height=sqrt(area), width=sqrt(area))) + scale_fill_distiller(palette = "Spectral",name="No. Birds") + xlab("Easting (km)") + ylab("Northing (km)") + theme_bw() + coord_equal() + ggtitle("Variogram based r") ggpubr::ggarrange(c, d, common.legend = TRUE, ncol = 1, legend = "right")
1) The Gaussian RBF is defined in [@scott-hayward2022] as:
$$e^{-(d*r)^2}$$
but may equally be presented as: $$e^{-(d⁄(2\sigma^2))}$$ In this latter equation, σ represents the range parameter and so the link to the $r$ parameter is: $r= \sqrt{2}\sigma$
2) The exponential RBF is defined in [@scott-hayward2015] as: $$e^{(d⁄r^2)}$$ It may also be presented as $$e^{(d⁄V)}$$
In this case, $V$ represents the range parameter and so the link to the $r$ parameter is $r=\sqrt{V}$
In the models fitted here, the range parameter was found to be r round(attributes(rs)$vg.fit[2,3])
indicating that on average there ceased to be spatial correlation after approximately r round(attributes(rs)$vg.fit[2,3])
km.
# # sk <- round(max(distMats$dataDist)/attr(rs, "vg.fit")$range[2]) # # salsa2dlist$startKnots <- sk # # salsa2dOutput.vario.sk <-runSALSA2D(model = initialModel, # salsa2dlist = salsa2dlist, # d2k=distMats$dataDist, # k2k=distMats$knotDist, # suppress.printout = TRUE) # # set.seed(154) # cv.gamMRSea(wfdata, salsa2dOutput.vario.sk$bestModel, K=10)$delta[2] # # preds.vario.sk<-predict(newdata = preddata, # g2k = preddist, # object = salsa2dOutput.vario.sk$bestModel) # ggplot(preddata) + # geom_tile(aes(x=x.pos, y=y.pos, fill=preds.vario.sk, height=sqrt(area), width=sqrt(area))) + # scale_fill_distiller(palette = "Spectral",name="No. Birds") + # xlab("Easting (km)") + ylab("Northing (km)") + theme_bw() + # coord_equal()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.