knitr::opts_chunk$set(cache=F,fig.width=5,fig.height=5,dpi=125) set.seed(123)
This particular vignette was written to explain how to build point process species distribution models using mapSpecies
. In essence, this vignette shows how to use the ppSpace
function and a few helper functions to build models.
The modelling approach implemented in the ppSpace
function is essentially a specialized wrapper around the INLA
R package so that it can be easier to construct point process models as they were presented by @simpson_going_2016.
To illustrate how the models are constructed, the Tinamus solitarius data, available in the bossMaps
, will be used.
library(mapSpecies)
The data used here comes from @borcard_environmental_1994. For the purpose of this illustration, substrate density and water content (the two continuous variables) have been interpolated across the sampling area with kriging while the class variables were reconstructed from Figure 1 of @borcard_environmental_1994. These data are all available in mite.envRaster
a RasterStack
. As for the species data, the sampled coordinates and the species abundances available in the vegan
R package were organized in a SpatialPointDataFrame
and are available in mite.spdf
.
#library(bossMaps) # Load species data data(Tinamus_solitarius_points)
# Environmental data data(Tinamus_solitarius_env)
Note that if some explanatory variables are factors, to preserve there structure as factors, a RasterStack
needs to be used because a RasterBrick
will convert the data into a matrix and the factor level information will be lost.
For our illustration, let's also build a SpatialPolygons
outlining the sampling area. This will become handy for a few steps when performing the analysis.
rasterForPoly <- Tinamus_solitarius_env[[1]] values(rasterForPoly) <- ifelse(is.na(values(rasterForPoly)), NA, 1) spacePoly <- rasterToPolygons(rasterForPoly, dissolve = TRUE)
The core of the analyses carried out in this document will be performed using the ppSpace
function. The first step that needs to be carried out to use this function (and thus to construct a spatially constrained model) is to build a Delaunay triangulation, a mesh, that will serve as a basis to construct the spatial component of the model through an SPDE approach.
Properly choosing the structure of the mesh is essential for the model estimation to be carried out properly. Detailed explanation on the dos and don'ts of constructing a mesh is presented in section 1.3 of the R-INLA tutorial on SPDE models. For a quick overview, take a look at the mesh vignette.
For our illustration, let's use the following mesh.
regionBorder <- extent(spacePoly) xyBasis <- rbind(cbind(c(xmin(spacePoly),xmax(spacePoly), xmin(spacePoly),xmax(spacePoly)), c(ymin(spacePoly),ymax(spacePoly), ymax(spacePoly),ymin(spacePoly))), coordinates(Tinamus_solitarius_points)) Mesh <- inla.mesh.2d(loc.domain = xyBasis, max.edge = 4, min.angle = 20, cutoff = 0.5, offset = c(5,5), crs = crs(spacePoly))
par(mar = c(1,1,1,1)) plot(Mesh, main = "", asp = 1) points(coordinates(Tinamus_solitarius_points), pch = 19, col = "red") # Number of edges in the mesh Mesh$n
To build a model with ppSpace
explanatory variables value need to be gathered for all edges of the mesh, even the ones outside the sampling region. This is important to prevent edge effect problems that may arise in model such as the one used here.
The function explanaMesh
is design to gather this information. In addition, the resulting object includes the Raster*
of explanatory variable(s) and the mesh.
Note that running the explanaMesh
function may take a bit of time.
explana <- explanaMesh(sPoly = spacePoly, mesh = Mesh, X = Tinamus_solitarius_env)
The approach proposed by @simpson_going_2016 suggest that a dual mesh diagram should be used to calculate the area around each edge and use this area as weight for the whole sampling region. For the mesh we constructed above the dual mesh looks likes this
dMesh <- mapSpecies:::inla.mesh.dual(Mesh) par(mar = c(1,1,1,1)) plot(Mesh, asp = 1, main = "") plot(dMesh, border = "blue", add = TRUE)
Note that the dual mesh shows some similarity with a Voronoï diagram.
To calculate the area of the cell associated to each mesh edge, we need to use the ppWeight
function.
weight <- ppWeight(sPoly = spacePoly, mesh = Mesh)
By taking a look at the calculated weights, it can be noticed that some weights are equal to 0. This is because the associated cells are completely outside the sampling area.
Now that all the pieces are constructed we can estimate the model using ppSpace
.
Note that, we included the argument control.compute
, an argument from INLA, to compute a Watanabe-Akaike information criterion (WAIC). Usually, the WAIC is used to compare different models, however here it was included simply to show that ppSpace
will pass additional arguments to the inla
function.
What is important to be aware at this point is that ppSpace
is essentially a specialized wrapper around the inla
function of the R package INLA
. As such, all arguments that can be passed to inla
can be passed to ppSpace
.
A particularity of ppSpace
is that in the formula, the response variable has to be y
.
Below two models were estimated with the sole difference that one was estimated using the argument many = TRUE
. The idea here is to show the differences in results but also in speed.
modelPP <- ppSpace(y ~ ., sPoints = Tinamus_solitarius_points, explanaMesh = explana, ppWeight = weight, control.compute = list(waic = TRUE)) modelPPmany <- ppSpace(y ~ ., sPoints = Tinamus_solitarius_points, explanaMesh = explana, ppWeight = weight, many = TRUE, control.compute = list(waic = TRUE))
Although one of the main interest of species distribution models is to estimate and predict the distribution of species, it is also relevant and important to study the estimated parameters to better understand the importance of the considered explanatory variable in structuring the distribution of the species. We can study these parameters using the summary
function
summary(modelPP)
To see the differences between modelPP
and modelPPmany
, lets focus on the estimated coefficients and the time it took to perform the estimation
# Estimated coefficients summary(modelPP)$coefficients summary(modelPPmany)$coefficients # Running time summary(modelPP)$runningTime summary(modelPPmany)$runningTime
As can be seen the model parameters are different but the time it took to run the model is reduced because the parameter estimation had to be done at a smaller number of locations.
To get a good idea of the quality of the map resulting from the model, in addition of plotting the average model, it can be valuable to also plot the standard deviation or a 95% confidence interval around the model. All of these maps can be constructed using mapSpace
.
#-------------------------- # Construct prediction maps #-------------------------- # Mean mapMean <- mapSpace(modelPP, dims = dim(Tinamus_solitarius_env)[1:2], type = "mean", sPoly = spacePoly) # Standard deviation mapSd <- mapSpace(modelPP, dims = dim(Tinamus_solitarius_env)[1:2], type = "sd", sPoly = spacePoly) # Lower boundary of the 95% confidence interval map.025 <- mapSpace(modelPP, dims = dim(Tinamus_solitarius_env)[1:2], type = "0.025quant", sPoly = spacePoly) # Upper boundary of the 95% confidence interval map.975 <- mapSpace(modelPP, dims = dim(Tinamus_solitarius_env)[1:2], type = "0.975quant", sPoly = spacePoly) #-------------------------------------- # Cut raster with polygon of the region #-------------------------------------- mapMaskMean <- mask(mapMean, spacePoly) mapMaskSd <- mask(mapSd, spacePoly) mapMask.025 <- mask(map.025, spacePoly) mapMask.975 <- mask(map.975, spacePoly)
# Colour to use for the maps colo <- colorRampPalette(c("grey90", "steelblue4", "steelblue2", "steelblue1", "gold", "red1", "red4"))(200) par(mfrow = c(2,2), mar = c(1,1,5,8)) plot(mapMaskMean, col = colo, zlim = c(0, 20), axes = FALSE, box = FALSE, main = "Mean") plot(mapMaskSd, col = colo, zlim = c(0, 20), axes = FALSE, box = FALSE, main = "Sd") plot(mapMask.025, col = colo, zlim = c(0, 20), axes = FALSE, box = FALSE, main = "2.5%") plot(mapMask.975, col = colo, zlim = c(0, 20), axes = FALSE, box = FALSE, main = "97.5")
By studying the mean distribution we can infer the distribution of the species in the study area, but by accounting for the standard deviation and the 95% confidence interval, we can also gain some knowledge about the area where we have high (or low) confidence in the prediction.
Note that the map presents an intensity rather than a probability of occurrence.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.