options(rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set( comment = "", cache = TRUE )
\newcommand{\bm}{\boldsymbol}
This vignette details how to include random intercepts when fitting single-species or multi-species occupancy models in spOccupancy
. For an introduction to the basic spOccupancy
functionality, see the introductory vignette. spOccupancy
supports random intercepts in the occurrence and detection portions of all single-species and multi-species occupancy models, with the exception that we have yet to implement random intercepts in any integrated occupancy models. Adding random intercepts to integrated models and random slopes to all models is part of future planned development. Here I show how to simulate data for a spatially explicit single-species occupancy model with random intercepts in occurrence and detection using simOcc()
, and subsequently show how to include random intercepts using lme4
syntax [@bates2015] with spPGOcc()
. Random intercepts are included in all other single-species and multi-species models in an analogous manner.
simOcc()
The function simOcc()
simulates single-species detection-nondetection data. simOcc()
has the following arguments.
simOcc(J.x, J.y, n.rep, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, cov.model, sigma.sq, phi, nu, ...)
J.x
and J.y
indicate the number of spatial locations to simulate data along a horizontal and vertical axis, respectively, such that J.x * J.y
is the total number of sites (i.e., J
). n.rep
is a numeric vector of length J
that indicates the number of replicates at each of the J sites. beta
and alpha
are numeric vectors containing the intercept and any regression coefficient parameters for the occurrence and detection portions of the occupancy model, respectively. psi.RE
and p.RE
are lists that are used to specify random intercepts on occurrence and detection, respectively. These are only specified when we want to simulate data with random intercepts. Each list should be comprised of two tags: levels
, a vector that specifies the number of levels for each random effect included in the model, and sigma.sq.psi
or sigma.sq.p
, which specify the variances of the random effects for each random effect included in the model. sp
is a logical value indicating whether to simulate data with a spatial Gaussian process. cov.model
specifies the covariance function used to model the spatial dependence structure, with supported values of exponential
, matern
, spherical
, and gaussian
. Finally, sigma.sq
is the spatial variance parameter, phi
is the spatial range parameter, and nu
is the spatial smoothness parameter (only applicable when cov.model = 'matern'
). Below we simulate data across 225 sites with 1-4 replicates at a given site, a single covariate effect on occurrence, a single covariate effect on detection, spatial autocorrelation following a spherical correlation function, and a random intercept on occurrence and detection.
library(spOccupancy) set.seed(1000) J.x <- 20 J.y <- 20 # Total number of sites (J <- J.x * J.y) # Number of replicates at each site n.rep <- sample(2:4, J, replace = TRUE) # Intercept and covariate effect on occurrence beta <- c(-0.5, -0.2) # Intercept and covariate effect on detection alpha <- c(0.9, -0.3) # Single random intercept on occurrence psi.RE <- list(levels = 20, sigma.sq.psi = 0.75) # Single random intercept on detection p.RE <- list(levels = 25, sigma.sq.p = 1.1) # Spatial range phi <- 3 / .8 # Spatial variance sigma.sq <- 1 # Simulate the data dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'spherical')
For the occurrence random effect, we assumed there were 20 levels and a variance of 0.75. For example, we could suppose these levels corresponded to different administrative units across the 225 sites we simulated, and we want to account for potential correlation between sites within each of the units. For the detection random effect, we assumed there were 25 levels and a variance of 1.1. For example, we could suppose there were 25 different observers that collected the data, and we wanted to account for variation in observer skill (and thus detection probability) across the different observers.
Next, let's explore the simulated data a bit before we move on (plotting code adapted from @hooten2019bringing).
str(dat)
The simulated data object consists of the following objects: X
(the occurrence design matrix), X.p
(the detection design matrix), coords
(the spatial coordinates of each site), w
(the latent spatial process, psi
(occurrence probability), z
(the latent occupancy status), y
(the detection-nondetection data, X.p.re
(the detection random effect levels for each site), X.re
(the occurrence random effect levels for each site), alpha.star
(the detection random effects for each level of the random effect), beta.star
(the occurrence random effects for each level of the random effect).
# Detection-nondetection data y <- dat$y # Occurrence design matrix for fixed effects X <- dat$X # Detection design matrix for fixed effets X.p <- dat$X.p # Occurrence design matrix for random effects X.re <- dat$X.re # Detection design matrix for random effects X.p.re <- dat$X.p.re # Occurrence values psi <- dat$psi # Spatial coordinates coords <- dat$coords # Simple plot of the occurrence probability across space. # Dark points indicate high occurrence. plot(coords, type = "n", xlab = "", ylab = "", asp = TRUE, main = "Simulated Occurrence", bty = 'n') points(coords, pch=15, cex = 2.1, col = rgb(0,0,0,(psi-min(psi))/diff(range(psi))))
We see there is clear variation in occurrence probability across the simulated spatial region. The final step before we can fit the model is to package up the data in a list for use in spOccupancy
model fitting functions. This requires creating a list that consists of the detection-nondetection data (y
), occurrence covariates (occ.covs
), detection covariates (det.covs
), and coordinates (coords
). See the introductory vignette for more details.
# Package all data into a list # Occurrence covariates consists of the fixed effects and random effect occ.covs <- cbind(X[, 2], X.re) colnames(occ.covs) <- c('occ.cov.1', 'occ.factor.1') # Detection covariates consists of the fixed effects and random effect det.covs <- list(det.cov.1 = X.p[, , 2], det.factor.1 = X.p.re[, , 1]) # Package into a list for spOccupancy data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Take a look at the data structure. str(data.list) # Take a look at the occurrence covariates head(occ.covs)
One important thing to note about including random effects in spOccupancy
is that we must supply the random effects in as numeric values. Notice in the occ.factor.1
column in occ.covs
, the random effect levels are specified as a numeric value, which indicates the specific level of the random effect at that given site. spOccupancy
will return an informative error if we supply random effects as factors or character vectors and will tell us to convert them to numeric in order to fit the model.
spPGOcc()
We now fit the model using spPGOcc()
. Random effects are included in the model formulas using standard lme4
notation. Below we run a spatially-explicit occupancy model for 400 batches each of length 25 (10000 total MCMC iterations). We use the default tuning and prior values that spOccupancy
provides. We use a Nearest Neighbor Gaussian Process with 10 neighbors and the spherical spatial correlation function.
inits <- list(alpha = 0, beta = 0, sigma.sq.psi = 0.5, sigma.sq.p = 0.5, z = apply(dat$y, 1, max, na.rm = TRUE), sigma.sq = 1, phi = 3 / 0.5) out.full <- spPGOcc(occ.formula = ~ occ.cov.1 + (1 | occ.factor.1), det.formula = ~ det.cov.1 + (1 | det.factor.1), data = data.list, n.batch = 400, batch.length = 25, inits = inits, accept.rate = 0.43, cov.model = "spherical", NNGP = TRUE, n.neighbors = 10, n.report = 100, n.burn = 5000, n.thin = 5, n.chains = 1) summary(out.full)
The summary of the model output shows our model recovers the values we used to simulate the data. Let's also fit a model that doesn't include the occurrence and detection random effects, and compare their performance using the WAIC [@watanabe2010].
out.no.re <- spPGOcc(occ.formula = ~ occ.cov.1, det.formula = ~ det.cov.1, data = data.list, n.batch = 400, batch.length = 25, inits = inits, accept.rate = 0.43, cov.model = "spherical", NNGP = TRUE, n.neighbors = 10, n.report = 100, n.burn = 5000, n.thin = 5, n.chains = 1) waicOcc(out.full) waicOcc(out.no.re)
We see the WAIC is substantially smaller for the model that includes the occurrence and detection random effects. Finally, let's look at the predicted occurrence values from the full model to see how they compare to the values we used to simulate the data.
par(mfrow = c(1, 2)) plot(coords, type = "n", xlab = "", ylab = "", asp = TRUE, main = "Simulated Occurrence", bty = 'n') points(coords, pch=15, cex = 2.1, col = rgb(0,0,0,(psi-min(psi))/diff(range(psi)))) # Predicted mean occurrence values psi.means <- apply(out.full$psi.samples, 2, mean) plot(coords, type = "n", xlab = "", ylab = "", asp = TRUE, main = "Predicted Occurrence", bty = 'n') points(coords, pch=15, cex = 2.1, col = rgb(0,0,0,(psi.means-min(psi.means))/diff(range(psi.means))))
We see the model does a fairly decent job of identifying the spatial structure in occurrence across the simulated region, with our mean predicted occurrence being more smooth than the single simulated occurrence data set, as we would expect. Just like with models with only fixed effects, we can predict new values of occurrence or detection probability given a set of covariate values and spatial coordinates using the predict()
function. However, prediction becomes a bit more complicated when we have non-structured random effects in the model, as we could imagine predicting at observed levels of the random effect, or predicting at new levels of the random effect. spOccupancy
allows for prediction at both observed levels and new levels of random effects, and also allows for prediction to take into account the random effects, or simply ignore the random effects when making predictions and just generate predictions using the fixed effects. All predict()
functions for spOccupancy
objects contain the argument ignore.RE
, which is a logical value that takes value TRUE
or FALSE
. When ignore.RE = FALSE
(the default), both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction, which will likely result in a more accurate estimate of occurrence/detection probability for that site. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects. Alternatively, if ignore.RE = TRUE
, the random effects will not be used for prediction and predictions will simply be generated using the fixed effects in the model.
Suppose we wish to predict occurrence at a new site, which has spatial coordinates (0.32, 0.55)
, a value of 0.34
for our occurrence covariate (occ.cov.1
) and a random effect level of 5 (which is a level that is sampled in our original data set). Below we predict occurrence at this new site, using the random effect in the prediction by setting ignore.RE = FALSE
(which we could just not specify since it is the default). Note that when predicting using random effects, the column name of the random effect must match the name of the random effect used when fitting the model.
# Create the design matrix for the new site X.0 <- cbind(1, 0.34, 5) # Make sure column names of random effects align with how we fit the model. colnames(X.0) <- c('intercept', 'occ.cov.1', 'occ.factor.1') # Coordinates of new site coords.0 <- cbind(0.32, 0.55) # Predict at new site pred.out <- predict(out.full, X.0, coords.0, ignore.RE = FALSE, verbose = FALSE) str(pred.out) # Get summary of occurrence probability at new site summary(pred.out$psi.0.samples)
Alternatively, we can just predict occurrence probability at the new site using the fixed effects only by setting ignore.RE = TRUE
.
# Remove random effect from design matrix X.0 <- X.0[, -3, drop = FALSE] pred.no.re <- predict(out.full, X.0, coords.0, ignore.RE = TRUE, verbose = FALSE) # Get summary of occurrence probability summary(pred.no.re$psi.0.samples)
Here we see a relatively small discrepancy between the predicted occurrence probability when we use the random effect and when we don't use the random effect. Predicting with random effects will allow you to fully propagate uncertainty in our predictions, but may not be desired if predicting at new locations where the level of the random effect is unknown or not sampled in the data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.