options(rmarkdown.html_vignette.check_title = FALSE)

Introduction

The @Rota2016 occupancy model is designed for presence/absence datasets with two or more (potentially) interacting species. The model allows for estimation of occupancy probabilities and the strength of interactions between species, as well as covariate effects on these parameters. The model generalizes the standard single-species occupancy model from @mackenzie_estimating_2002.

The model assumes the latent occupancy state at site $i$ for a set of $s$ potentially interacting species is a vector $Z_i$ of length $s$ containing a sequence of the values 0 or 1. For example, when $s = 2$, the possible states are [11], [10], [01], or [00], corresponding to both species present, only species 1 or species 2 present, or both species absent, respectively. The latent state modeled as a multivariate Bernoulli random variable: $$ Z_i \sim \mathrm{MVB}(\psi_i) $$ where $\psi_i$ is a vector of length $2^s$ containing the probability of each possible combination of 0s and 1s, such that $\sum \psi_i = 1$.

For $s = 2$, the corresponding natural parameters, $f$, are

$$ \begin{split} f_1 &= \mathrm{log}(\psi_{10}/\psi_{00}) \
f_2 &= \mathrm{log}(\psi_{01}/\psi_{00}) \
f_{12} &= \mathrm{log}((\psi_{11} * \psi_{00})/(\psi_{10} * \psi_{01})) \end{split} $$

The natural parameters can then be modeled as linear functions of covariates.

The observation process is similar to the standard single-species occupancy model, except that the observations $y_{ij}$ at site $i$ on occasion $j$ are vectors of length $s$ and there are independent values of detection probability $p$ for each species $s$: $$ y_{ij} | Z_i \sim \mathrm{Bernoulli}(Z_i * p_{sij}) $$ See @Rota2016 for more details on model structure. In unmarked, the model can be fit with the occuMulti function.

Simple multispecies analysis

We will start with a simple analysis using presence/absence data from camera traps for three species: coyote, red fox, and bobcat. The data are a simplified version of the data used in @Rota2016, with the data collapsed three into three-week long time periods.

Formatting the data

The dataset is included with unmarked and is called MesoCarnivores. First, we need to load in the dataset, which is a list with several components.

library(unmarked)
data(MesoCarnivores)
names(MesoCarnivores)

Presence/absence matrices for the three species are in list elements bobcat, coyote, and redfox, and sitecovs contains the site-level covariate data.

Using this information, we will create an unmarkedFrameOccuMulti object. You can get more information by looking at the help file for unmarkedFrameOccuMulti:

?unmarkedFrameOccuMulti

First we combine the detection data for the 3 species into one named list. These names will be used throughout the multispecies analysis, so pick useful ones.

ylist <- list(bobcat=MesoCarnivores$bobcat, coyote=MesoCarnivores$coyote, 
              redfox=MesoCarnivores$redfox)
lapply(ylist, head)

The site covariates are contained in a data frame:

head(MesoCarnivores$sitecovs)

The site covariates include a metric of disturbance in a 5 km radius (Dist_5km), housing density in a 5 km radius (HDens_5km), latitude, longitude, a metric of how many people use the site (People_site) and whether the camera site is on (or off) a trail. Using our ylist, site covariates, and observation covariates, we can construct an unmarkedFrameOccuMulti object.

umf <- unmarkedFrameOccuMulti(y=ylist, siteCovs=MesoCarnivores$sitecovs)

Occupancy formulas

While most unmarked models have only one or two formulas (and a single formula for occupancy or abundance), occuMulti requires one formula per natural parameter $f$. Thus, there will be multiple formulas associated with occupancy. These formulas are organized into an ordered character vector. It can be hard to keep track of how many natural parameters there are and what each one represents. It can be helpful to look at the $f$-design matrix, which is generated by unmarkedFrameOccuMulti.

umf@fDesign

The number and order of the formulas in the vector should match the column names of this matrix. There are 7 columns in the matrix: thus, we'll need 7 formulas total, and they should be provided in the following order:

colnames(umf@fDesign)

For this model we'll set the 1st and 2nd-order $f$ parameters to be intercept-only, and fix the 3rd order parameter at 0. We will combine our formulas into a vector called stateformulas, like this:

stateformulas <- c("~1","~1","~1","~1","~1","~1","0")

Notice that the formulas are character strings (each wrapped in ""). This is required.

Detection formulas

Each species has its own detection probability formula. Thus, there should be 3 total formulas combined in a detformulas vector. The order of the formulas should match the order of species in ylist. For this model, all three species will have intercept-only detection formulas.

detformulas <- c("~1","~1","~1")

Fit the model

First, look at the help file for occuMulti to check what the required arguments are:

?occuMulti

We now have all the pieces we need (unmarkedFrameOccuMulti, stateformulas, detformulas) needed to run a basic model which we will call mod_null.

mod_null <- occuMulti(detformulas=detformulas, stateformulas=stateformulas, data=umf)
summary(mod_null)

The regression parameters associated with each $f$ are identified by the species name (or combination of species names) in brackets.

A few things to notice:

Occupancy probabilities

To get the expected probability for each occupancy state ([11], [10] and so on) at each site, use the predict function. This gives you the probabilities along with standard errors and a 95% CI.

occ_prob <- predict(mod_null, type="state")
head(occ_prob$Predicted)

The rows of this matrix should sum to 1. All rows are the same because estimated occupancies at all sites are the same - we didn't include any covariates.

Marginal occupancy

It's often more interesting to look at occupancy for species individually. For example, you might want to know the marginal occupancy of one species at each site (across all possible occupancy states).

You can do this by specifying the species argument in predict, which will automatically sum up the appropriate occupancy states.

redfox_marginal <- predict(mod_null, type="state", species="redfox")
head(redfox_marginal)

Plotting marginal occupancy

Outputs from predict can be used to compare marginal occupancy across species with a plot. First, we'll need to get marginal occupancy for the other two species, and combine the three estimates into a single data frame.

coy_marginal <- predict(mod_null, type="state", species="coyote")
bob_marginal <- predict(mod_null, type="state", species="bobcat")
all_marginal <- rbind(redfox_marginal[1,], coy_marginal[1,], bob_marginal[1,])
all_marginal$Species <- c("Red fox", "Coyote", "Bobcat")

Now we can plot the estimated marginal occupancy for each species, along with 95\% CIs.

plot(1:3, all_marginal$Predicted, ylim=c(0.1,0.4), 
     xlim=c(0.5,3.5), pch=19, cex=1.5, xaxt='n', 
     xlab="", ylab="Marginal occupancy and 95% CI")
axis(1, at=1:3, labels=all_marginal$Species)

# CIs
top <- 0.1
for (i in 1:3){
  segments(i, all_marginal$lower[i], i, all_marginal$upper[i])
  segments(i-top, all_marginal$lower[i], i+top)
  segments(i-top, all_marginal$upper[i], i+top)
}

Conditional occupancy

Alternatively, you might want to know the probability of occupancy of one species, conditional on the presence of another. Use the species and cond arguments together for this. For example, the probability of red fox occupancy, conditional on coyote presence:

redfox_coy <- predict(mod_null, type="state", species="redfox", cond="coyote")
head(redfox_coy)

What about conditional on coyote absence? Simply add a "-" before the species name.

redfox_nocoy <- predict(mod_null, type="state", species="redfox", cond="-coyote")
head(redfox_nocoy)

Plotting conditional occupancy

You can use this output from predict to generate comparison plots.

cond_data <- rbind(redfox_coy[1,], redfox_nocoy[1,])
cond_data$Coyote_status <- c("Present","Absent")

plot(1:2, cond_data$Predicted, ylim=c(0,0.3), 
     xlim=c(0.5,2.5), pch=19, cex=1.5, xaxt='n', 
     xlab="Coyote status", ylab="Red fox occupancy and 95% CI")
axis(1, at=1:2, labels=cond_data$Coyote_status)

# CIs
top <- 0.1
for (i in 1:2){
  segments(i, cond_data$lower[i], i, cond_data$upper[i])
  segments(i-top, cond_data$lower[i], i+top)
  segments(i-top, cond_data$upper[i], i+top)
}

Note that red fox occupancy is higher at sites where coyotes were present, which corresponds with the positive interaction term between the two species we saw in the model output summary.

Multispecies model with covariates

Now we'll fit a model with covariates on some natural parameters. It might be helpful to look at the order for our $f$ parameters again:

colnames(umf@fDesign)

And our available site covariates:

head(siteCovs(umf))

Add housing density as a covariate

We'll fit a model with an effect of housing density (HDens_5km) on the first-order parameters for all three species. The two-way interactions will remain intercept-only, and the three-way interaction will remain fixed at 0. Here's the vector of $f$ formulas:

sf <- c("~HDens_5km","~HDens_5km","~HDens_5km","~1","~1","~1","0")

Inside your formula, you can wrap the variable name in \code{scale()} to standardize it, which we should do, because the housing density variable is not scaled.

sf <- c("~scale(HDens_5km)","~scale(HDens_5km)","~scale(HDens_5km)","~1","~1","~1","0")

Detection formulas will remain the same, so we're now ready to fit a new model, mod_hdens.

mod_hdens <- occuMulti(stateformulas=sf, detformulas=detformulas, umf)
summary(mod_hdens)

A few things to note from the results:

Plotting covariate effects

To plot the effect of housing density on marginal occupancy, we again use predict. First, we need to generate sequence of possible Hdens_5km values for the X-axis of our plot. We'll generate a sequence of 100 values, starting at the minimum observed housing density and ending at the maximum observed value.

hdens_range <- range(siteCovs(umf)$HDens_5km)
hdens_seq <- seq(hdens_range[1], hdens_range[2], length.out=100)

Next, we'll predict marginal coyote occupancy at each value of Hdens_5km along our sequence. Our custom housing density values should be in a data frame and provided to the newdata argument. Because we used scale() in our formulas above, there is no need to manually scale these new housing density values - unmarked will do it for us.

nd <- data.frame(HDens_5km = hdens_seq)
occ_hdens_coy <- predict(mod_hdens, type="state", species="coyote", newdata=nd)
occ_hdens_coy$Species <- "Coyote"
occ_hdens_coy$Hdens <- hdens_seq
head(occ_hdens_coy)

We'll do the same thing for the other two species.

occ_hdens_bob <- predict(mod_hdens, type="state", species="bobcat", newdata=nd)
occ_hdens_fox <- predict(mod_hdens, type="state", species="redfox", newdata=nd)
occ_hdens_bob$Species <- "Bobcat"
occ_hdens_fox$Species <- "Red fox"
occ_hdens_bob$Hdens <- hdens_seq
occ_hdens_fox$Hdens <- hdens_seq

Finally, we'll build our plot. Housing density will be on the x-axis, marginal occupancy on the y-axis, and species will be identified by colors.

plot(occ_hdens_coy$Hdens, occ_hdens_coy$Predicted, type='l', ylim=c(0,0.6),
     col='red', lwd=2, xlab="Housing density", ylab="Marginal occupancy")
lines(occ_hdens_bob$Hdens, occ_hdens_bob$Predicted, col='blue', lwd=2)
lines(occ_hdens_fox$Hdens, occ_hdens_fox$Predicted, col='green', lwd=2)
legend('topleft', col=c('red', 'blue', 'green'), lty=1,
       legend=c("Coyote", "Bobcat", "Red fox"))

Model selection

unmarked can calculate AIC, $\Delta$AIC, and weights for a list of models automatically. Start by creating a fitList object containing our two models:

mods <- fitList(mod_null, mod_hdens)

Then call the function modSel on our list of models to generate a model selection table:

modSel(mods)

Looks like the model with housing density is better.

Model fitting challenges

Multispecies occupancy models often have many parameters, and can be difficult to fit in some situations. You might get poor estimates (i.e., very large absolute values and/or large SEs) under certain conditions:

Here's an example of a complex model with many covariates that results in poor estimates.

state_complex <- c(rep("~scale(Dist_5km)+scale(HDens_5km)", 6), 0)
det_complex <- rep("~Trail",3)

mod_complex <- occuMulti(stateformulas=state_complex, detformulas=det_complex, umf)
summary(mod_complex)
##
## Call:
## occuMulti(detformulas = det_complex, stateformulas = state_complex,
##     data = umf, maxOrder = 3L)
##
## Occupancy (logit-scale):
##                                  Estimate     SE      z  P(>|z|)
## [bobcat] (Intercept)             -23.0171  5.784 -3.980 6.90e-05
## [bobcat] scale(Dist_5km)          -2.4249  0.689 -3.519 4.34e-04
## [bobcat] scale(HDens_5km)        -82.3836 19.788 -4.163 3.14e-05
## [coyote] (Intercept)              -0.6789  0.225 -3.017 2.55e-03
## [coyote] scale(Dist_5km)          -0.0176  0.139 -0.127 8.99e-01
## [coyote] scale(HDens_5km)         -0.5534  0.748 -0.740 4.59e-01
## [redfox] (Intercept)              -1.3946  0.257 -5.425 5.78e-08
## [redfox] scale(Dist_5km)          -0.5293  0.250 -2.115 3.45e-02
## [redfox] scale(HDens_5km)          0.2108  0.261  0.808 4.19e-01
## [bobcat:coyote] (Intercept)        6.7598  6.384  1.059 2.90e-01
## [bobcat:coyote] scale(Dist_5km)    1.6979  0.695  2.445 1.45e-02
## [bobcat:coyote] scale(HDens_5km)  17.9202 21.442  0.836 4.03e-01
## [bobcat:redfox] (Intercept)       15.3983  3.462  4.448 8.67e-06
## [bobcat:redfox] scale(Dist_5km)    0.8836  0.439  2.014 4.40e-02
## [bobcat:redfox] scale(HDens_5km)  64.1330 12.377  5.182 2.20e-07
## [coyote:redfox] (Intercept)        1.1084  0.363  3.050 2.29e-03
## [coyote:redfox] scale(Dist_5km)    0.1149  0.340  0.338 7.35e-01
## [coyote:redfox] scale(HDens_5km)   0.9046  0.781  1.159 2.47e-01
## 
## Detection (logit-scale):
##                      Estimate     SE      z  P(>|z|)
## [bobcat] (Intercept)    -2.83 0.1419 -19.91 3.10e-88
## [bobcat] Trail           1.74 0.1542  11.26 2.10e-29
## [coyote] (Intercept)    -1.96 0.0984 -19.88 5.89e-88
## [coyote] Trail           2.17 0.1220  17.75 1.63e-70
## [redfox] (Intercept)    -1.59 0.1601  -9.93 2.97e-23
## [redfox] Trail           1.78 0.1997   8.93 4.12e-19
## 
## AIC: 5958.196
## Number of sites: 1437
## optim convergence code: 0
## optim iterations: 196
## Bootstrap iterations: 0

Note that several estimates are very large (>10) and also have large SEs. You should be very skeptical about using a model with poor estimates, like this one, for inference. Potential solutions when you get poor estimates include:

Penalized likelihood

occuMulti uses maximum likelihood to estimate parameters. We can add a "penalty" to the calculated likelihood to keep parameter estimates from getting stuck at huge values Use of penalized likelihood has been shown to help with separation/boundary issues, eliminate unreasonably large estimates, and reduce error, However, note that the penalty term introduces a small amount of bias in the parameter estimates: thus, we are making a tradeoff between bias and variance With huge SEs as with the previous model, this may be a good tradeoff to make.

One type of penalty is the "Bayes" penalty: $$ -\lambda\frac{1}{2}\sum_i{}\theta_i^2 $$ In the formula above, $\lambda$ is the penalty value, and $\theta$ is the vector of estimated parameters. As the parameter values get bigger, the total penalty increases.

Penalized likelihood with occuMulti

occuMulti can use penalized likelihood to fit models. You can provide a value to the penalty argument directly, or use the optimizePenalty function on a fitted model to choose the best value of $\lambda$ using K-fold cross-validation, and re-fit the model using the optimal penalty term value.

set.seed(123)
mod_penalty <- optimizePenalty(mod_complex, penalties=c(0.5,1))
summary(mod_penalty)
## Optimal penalty is 1
## Bootstraping covariance matrix

## Call:
## occuMulti(detformulas = c("~Trail", "~Trail", "~Trail"), stateformulas = c("~scale(Dist_5km)+scale(HDens_5km)",
## "~scale(Dist_5km)+scale(HDens_5km)", "~scale(Dist_5km)+scale(HDens_5km)",
## "~scale(Dist_5km)+scale(HDens_5km)", "~scale(Dist_5km)+scale(HDens_5km)",
## "~scale(Dist_5km)+scale(HDens_5km)", "0"), data = object@data,
##     maxOrder = 3L, penalty = 1, boot = boot)

## Occupancy (logit-scale):
##                                  Estimate    SE      z  P(>|z|)
## [bobcat] (Intercept)              -1.7810 0.221 -8.054 8.00e-16
## [bobcat] scale(Dist_5km)          -1.3143 0.337 -3.903 9.48e-05
## [bobcat] scale(HDens_5km)         -2.8200 0.529 -5.334 9.61e-08
## [coyote] (Intercept)              -0.6049 0.178 -3.407 6.56e-04
## [coyote] scale(Dist_5km)           0.0285 0.150  0.190 8.49e-01
## [coyote] scale(HDens_5km)         -1.0908 0.397 -2.748 5.99e-03
## [redfox] (Intercept)              -1.5659 0.310 -5.059 4.22e-07
## [redfox] scale(Dist_5km)          -0.3068 0.138 -2.226 2.60e-02
## [redfox] scale(HDens_5km)          0.4730 0.797  0.593 5.53e-01
## [bobcat:coyote] (Intercept)        1.1871 0.372  3.195 1.40e-03
## [bobcat:coyote] scale(Dist_5km)    0.9347 0.368  2.537 1.12e-02
## [bobcat:coyote] scale(HDens_5km)  -0.3218 1.043 -0.309 7.58e-01
## [bobcat:redfox] (Intercept)       -0.8831 0.346 -2.553 1.07e-02
## [bobcat:redfox] scale(Dist_5km)    0.0364 0.233  0.156 8.76e-01
## [bobcat:redfox] scale(HDens_5km)   2.5609 1.074  2.384 1.71e-02
## [coyote:redfox] (Intercept)        1.0001 0.249  4.009 6.09e-05
## [coyote:redfox] scale(Dist_5km)    0.0236 0.229  0.103 9.18e-01
## [coyote:redfox] scale(HDens_5km)   1.3920 0.424  3.281 1.03e-03

## Detection (logit-scale):
##                      Estimate    SE      z  P(>|z|)
## [bobcat] (Intercept)    -2.44 0.150 -16.22 3.72e-59
## [bobcat] Trail           1.74 0.164  10.61 2.59e-26
## [coyote] (Intercept)    -1.89 0.117 -16.24 2.72e-59
## [coyote] Trail           2.10 0.150  14.00 1.52e-44
## [redfox] (Intercept)    -1.49 0.206  -7.21 5.66e-13
## [redfox] Trail           1.72 0.254   6.79 1.14e-11

## AIC: 6135.555
## Number of sites: 1437
## optim convergence code: 0
## optim iterations: 100
## Bootstrap iterations: 30

Notice that parameter estimates and SEs for the model using penalized likelihood are no longer gigantic. For more info on the use of penalized likelihood with multispecies occupancy models, see @Clipp_2021.

References



rbchan/unmarked documentation built on April 3, 2024, 10:11 p.m.