knitr::opts_chunk$set(cache=F,fig.width=5,fig.height=5,dpi=125)

set.seed(123)

Introduction

This particular vignette was written to explain how to build point process species distribution models using mapSpecies. In essence, this vignette shows how to use the ppSpace function and a few helper functions to build models.

The modelling approach implemented in the ppSpace function is essentially a specialized wrapper around the INLA R package so that it can be easier to construct point process models as they were presented by @simpson_going_2016.

To illustrate how the models are constructed, the Tinamus solitarius data, available in the bossMaps, will be used.

Load R package for the analysis

library(mapSpecies)

Data

REwrite this section based on the new data used

The data used here comes from @borcard_environmental_1994. For the purpose of this illustration, substrate density and water content (the two continuous variables) have been interpolated across the sampling area with kriging while the class variables were reconstructed from Figure 1 of @borcard_environmental_1994. These data are all available in mite.envRaster a RasterStack. As for the species data, the sampled coordinates and the species abundances available in the vegan R package were organized in a SpatialPointDataFrame and are available in mite.spdf.

Species data

#library(bossMaps)

# Load species data
data(Tinamus_solitarius_points)

Environmental data

# Environmental data
data(Tinamus_solitarius_env)

Note that if some explanatory variables are factors, to preserve there structure as factors, a RasterStack needs to be used because a RasterBrick will convert the data into a matrix and the factor level information will be lost.

Sampling region

For our illustration, let's also build a SpatialPolygons outlining the sampling area. This will become handy for a few steps when performing the analysis.

rasterForPoly <- Tinamus_solitarius_env[[1]]
values(rasterForPoly) <- ifelse(is.na(values(rasterForPoly)), NA, 1)

spacePoly <- rasterToPolygons(rasterForPoly, dissolve = TRUE)

Building the mesh

The core of the analyses carried out in this document will be performed using the ppSpace function. The first step that needs to be carried out to use this function (and thus to construct a spatially constrained model) is to build a Delaunay triangulation, a mesh, that will serve as a basis to construct the spatial component of the model through an SPDE approach.

Properly choosing the structure of the mesh is essential for the model estimation to be carried out properly. Detailed explanation on the dos and don'ts of constructing a mesh is presented in section 1.3 of the R-INLA tutorial on SPDE models. For a quick overview, take a look at the mesh vignette.

For our illustration, let's use the following mesh.

regionBorder <- extent(spacePoly)
xyBasis <- rbind(cbind(c(xmin(spacePoly),xmax(spacePoly),
                         xmin(spacePoly),xmax(spacePoly)),
                       c(ymin(spacePoly),ymax(spacePoly), 
                         ymax(spacePoly),ymin(spacePoly))),
                 coordinates(Tinamus_solitarius_points))

Mesh <- inla.mesh.2d(loc.domain = xyBasis,
                     max.edge = 4,
                     min.angle = 20,
                     cutoff = 0.5,
                     offset = c(5,5),
                     crs = crs(spacePoly))
par(mar = c(1,1,1,1))

plot(Mesh, main = "", asp = 1)
points(coordinates(Tinamus_solitarius_points), pch = 19, col = "red")

# Number of edges in the mesh
Mesh$n

Organizing the explanatory variables

To build a model with ppSpace explanatory variables value need to be gathered for all edges of the mesh, even the ones outside the sampling region. This is important to prevent edge effect problems that may arise in model such as the one used here.

The function explanaMesh is design to gather this information. In addition, the resulting object includes the Raster* of explanatory variable(s) and the mesh.

Note that running the explanaMesh function may take a bit of time.

explana <- explanaMesh(sPoly = spacePoly, 
                       mesh = Mesh, 
                       X = Tinamus_solitarius_env)

Calculating weights associated to each edges of the mesh

The approach proposed by @simpson_going_2016 suggest that a dual mesh diagram should be used to calculate the area around each edge and use this area as weight for the whole sampling region. For the mesh we constructed above the dual mesh looks likes this

dMesh <- mapSpecies:::inla.mesh.dual(Mesh)
par(mar = c(1,1,1,1))
plot(Mesh, asp = 1, main = "")
plot(dMesh, border = "blue", add = TRUE)

Note that the dual mesh shows some similarity with a Voronoï diagram.

To calculate the area of the cell associated to each mesh edge, we need to use the ppWeight function.

weight <- ppWeight(sPoly = spacePoly, mesh = Mesh)

By taking a look at the calculated weights, it can be noticed that some weights are equal to 0. This is because the associated cells are completely outside the sampling area.

Building the model

Now that all the pieces are constructed we can estimate the model using ppSpace.

Note that, we included the argument control.compute, an argument from INLA, to compute a Watanabe-Akaike information criterion (WAIC). Usually, the WAIC is used to compare different models, however here it was included simply to show that ppSpace will pass additional arguments to the inla function.

What is important to be aware at this point is that ppSpace is essentially a specialized wrapper around the inla function of the R package INLA. As such, all arguments that can be passed to inla can be passed to ppSpace.

A particularity of ppSpace is that in the formula, the response variable has to be y.

Below two models were estimated with the sole difference that one was estimated using the argument many = TRUE. The idea here is to show the differences in results but also in speed.

modelPP <- ppSpace(y ~ ., sPoints = Tinamus_solitarius_points,
                   explanaMesh = explana,
                   ppWeight = weight,
                   control.compute = list(waic = TRUE))

modelPPmany <- ppSpace(y ~ ., sPoints = Tinamus_solitarius_points,
                   explanaMesh = explana,
                   ppWeight = weight,
                   many = TRUE,
                   control.compute = list(waic = TRUE))

Studying the estimated parameters

Although one of the main interest of species distribution models is to estimate and predict the distribution of species, it is also relevant and important to study the estimated parameters to better understand the importance of the considered explanatory variable in structuring the distribution of the species. We can study these parameters using the summary function

summary(modelPP)

To see the differences between modelPP and modelPPmany, lets focus on the estimated coefficients and the time it took to perform the estimation

# Estimated coefficients
summary(modelPP)$coefficients
summary(modelPPmany)$coefficients

# Running time
summary(modelPP)$runningTime
summary(modelPPmany)$runningTime

As can be seen the model parameters are different but the time it took to run the model is reduced because the parameter estimation had to be done at a smaller number of locations.

Species distribution map

To get a good idea of the quality of the map resulting from the model, in addition of plotting the average model, it can be valuable to also plot the standard deviation or a 95% confidence interval around the model. All of these maps can be constructed using mapSpace.

#--------------------------
# Construct prediction maps
#--------------------------
# Mean
mapMean <- mapSpace(modelPP,
                    dims = dim(Tinamus_solitarius_env)[1:2],
                    type = "mean",
                    sPoly = spacePoly)
# Standard deviation
mapSd <- mapSpace(modelPP,
                    dims = dim(Tinamus_solitarius_env)[1:2],
                    type = "sd",
                    sPoly = spacePoly)
# Lower boundary of the 95% confidence interval
map.025 <- mapSpace(modelPP,
                    dims = dim(Tinamus_solitarius_env)[1:2],
                    type = "0.025quant",
                    sPoly = spacePoly)
# Upper boundary of the 95% confidence interval
map.975 <- mapSpace(modelPP,
                    dims = dim(Tinamus_solitarius_env)[1:2],
                    type = "0.975quant",
                    sPoly = spacePoly)

#--------------------------------------
# Cut raster with polygon of the region
#--------------------------------------
mapMaskMean <- mask(mapMean, spacePoly)
mapMaskSd <- mask(mapSd, spacePoly)
mapMask.025 <- mask(map.025, spacePoly)
mapMask.975 <- mask(map.975, spacePoly)
# Colour to use for the maps
colo <- colorRampPalette(c("grey90", "steelblue4", 
                           "steelblue2", "steelblue1", 
                           "gold", "red1", "red4"))(200)

par(mfrow = c(2,2), mar = c(1,1,5,8))
plot(mapMaskMean, col = colo, zlim = c(0, 20),
     axes = FALSE, box = FALSE, main = "Mean")

plot(mapMaskSd, col = colo,  zlim = c(0, 20),
     axes = FALSE, box = FALSE, main = "Sd")

plot(mapMask.025, col = colo,  zlim = c(0, 20),
     axes = FALSE, box = FALSE, main = "2.5%")

plot(mapMask.975, col = colo,  zlim = c(0, 20),
     axes = FALSE, box = FALSE, main = "97.5")

By studying the mean distribution we can infer the distribution of the species in the study area, but by accounting for the standard deviation and the 95% confidence interval, we can also gain some knowledge about the area where we have high (or low) confidence in the prediction.

Note that the map presents an intensity rather than a probability of occurrence.

References



ReseauBiodiversiteQuebec/mapSpecies documentation built on Dec. 18, 2021, 9:57 a.m.