knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Multiseason occupancy models, also called dynamic occupancy models, are models that assume closure over units that are linked by explicit colonization and extinction processes.^[As we will see, this is true even of multiseason models with autologistic occupancy dynamics] That is, sites are surveyed over a series of more than one timestep,^[It's fine if some sites are surveyed for just one timestep, but at least some sites are surveyed for multiple timesteps.] closure is assumed to hold within (but not across) timesteps, and the latent occupancy state is linked across timesteps within a site via models of local colonization and extinction processes. Following the terminology introduced here, we will refer to the units over which closure is assumed as closure-units or just plain units. Groups of units that are linked by colonization-extinction dynamics (i.e. units representing different timesteps within a site) are series.
In flocker
, we implement these models using a hidden markov model (HMM)
approach to dealing with the unobserved occupancy state in each separate series.
Calling our approach a HMM doesn't fundamentally change the model; it serves
mainly to highlight the connection between multiseason occupancy models and
well-developed techniques for efficiently computing the likelihood for a
series.^[For the curious, we use the forward algorithm to compute the
likelihood, and the forward-backward algorithm to compute unit-wise posterior
distributions for the latent occupancy state, and the forward-filtering-backward-sampling
algorithm to draw valid posterior samples for the sequence of latent states]
The HMM conceptualizes the latent occupancy state as a Markov-like process (the
process need not be strictly Markovian, because the transition probabilities can
vary by timestep), where the transition probabilities between the two possible
states (occupied or unoccupied) are a colonization probability $\mathcal{C}$,
an extinction probability $\mathcal{E}$, and their complements $1 - \mathcal{C}$
and $1 - \mathcal{E}$. The complement of the extinction probability turns out to
be particularly important; we call it the persistence probability
$\mathcal{P} = 1 - \mathcal{E}$.
To specify the HMM, we need a model for the $\mathcal{C}$, a model for $\mathcal{E}$, and a model for the initial occupancy probability $\mathcal{O}$ in the first timestep. We also need a model for the probability of observing the data in any particular timestep conditional on the true occupancy state in that timestep (in the HMM literature, this probability is referred to as an emission probability).
Emission probabilities are just fancy HMM terminology for the most familiar
part of multiseason occupancy model. They are the probability of observing the
data within a unit given the latent occupancy state, and they depend only on the
detection sub-model, which is no different from a single-season detection
sub-model. In flocker
, we assume that the logit-scale detection probabilities
vary according to a suite of covariates that can vary across all levels of
organization, including down to the visit level. This is exactly the same way
that flocker
treats the detection sub-model in a single-season model.
In "colonization-extinction" models, logit colonization probabilities $\mathcal{C}$ and logit extinction probabilities $\mathcal{E}$ are both assumed to vary according to separate predictors based on covariates that may vary across timesteps within series but never across visits within closure-units.
In "autologistic" models, the occupancy probability in timestep $t + 1$ is assumed to depend on a suite of site characteristics plus a term that depends on the latent true occupancy state at time $t$. Some treatments write the autologistic model in terms of an "occupancy probability" that depends on covariates plus the "autologistic term" that gets added on the logit scale when the site was previously occupied. However, it is easier to understand this model in terms of its colonization and extinction probabilities. In fact, the so-called "occupancy probability" mentioned above is actually a colonization probability, because it reflects the probability of occupancy conditional on non-occupancy during the previous timestep. The logit-scale sum of the colonization probability and the autologistic term gives the probability of persistence $\mathcal{P}$.
Thus, the autologistic model differs from the colonization-extinction model in that it assumes that colonization and extinction probabilities are related to one another via an offset that relates colonization to persistence. If the offset is modeled as constant, then we can think of this model as ascribing a "suitability" to each site that defines the position of a pair of probabilities $\mathcal{C}$ and $\mathcal{P}$ that differ by a constant on the logit scale. In practice, $\mathcal{P}$ is almost always larger than $\mathcal{C}$, and so the offset is positive.
In addition to the standard autologistic model that treats the offset as a
constant, flocker
also affords the flexibility to model the offset via its
own covariate-based predictor, which reintroduces flexibility in the
relationship between $\mathcal{P}$ and $\mathcal{C}$, but tends to retain the
idea that these probabilities ought to be related to one another and ought to
tend to move in tandem.
Most examples of multiseason occupancy models in the literature include an
explicit model for the initial occupancy probability $\mathcal{O}$ via its own
covariate-based predictor, which is analogous to the occupancy predictor
in a single-season model. Thus, a colonization-extinction model would contain
predictors for detection, colonization, extinction, and initial
occupancy; an autologistic model would contain linear predictors for detection,
colonization, the colonization-persistence offset (often modeled as a constant),
and initial occupancy. flocker
can fit either model.
Borrowing from the HMM literature, flocker
also comes with optional
functionality to handle $\mathcal{O}$ differently. Any pair of time-invariant
colonization and extinction probabilities defines an equilibrium proportion of
sites that are occupied. If colonization and extinction remain constant for a
long time prior to the first timestep, the initial occupancy probability should
simply be this equilibrium frequency, eliminating the need to parameterize
and fit initial occupancy separately from colonization and extinction.
flocker
is equipped to fit models without any separate predictor for initial
occupancy, and instead to begin with the equilibrium frequencies implied by the
colonization and extinction probabilities modeled for the first timestep.
Note that this approach is sometimes valid even if the colonization and
extinction probabilities begin to vary during the timeseries being modeled,
as long as they are invariant for a substantial period up to and including the
first year modeled. As a concrete example, consider a study system that begins
in an equilibrium condition, but in which some sites burn in a forest fire
partway through the study. If the impact of the burn on colonization and
extinction probabilities is modeled adequately,
then the modeled probabilities prior to the burn will reflect the equilibrium
colonization and extinction probabilities in the pre-burn state. It is for this
reason that, when using equilibrium frequencies as initial occupancy
probabilities, flocker
always computes the equilibrium based on the
probabilities modeled for the first timestep.
As an aside, we somewhat regularly encounter confusion that conflates initial occupancy probabilities $\mathcal{O}$ with colonization probabilities $\mathcal{C}$. This confusion apparently arises from the notion that the autologistic model consists of a single-season occupancy model plus an autologistic term, which leads people to incorrectly surmise that the linear predictor without the autologistic term gives an occupancy probability. It does not; it gives a colonization probability, which is typically much lower. Any modeling exercise that enforces equality between initial occupancy probabilities and subsequent colonization probabilities should be viewed skeptically.
Let's simulate some data that is valid for use with multiseason models. Here,
we will simulate data for three seasons with one unit covariate and one event
covariate. The data will be simulated under a colonization-extinction model with
explicit inits, but we will be able to fit other models (autologistic,
equilibrium inits) to the same data.^[note that simulate_flocker_data()
can
also simulate directly from these other model types.]
library(flocker) multi_data <- simulate_flocker_data( n_season = 3, n_pt = 300, n_sp = 1, multiseason = "colex", multi_init = "explicit", seed = 1 ) fd <- make_flocker_data( multi_data$obs, multi_data$unit_covs, multi_data$event_covs, type = "multi", quiet = TRUE )
Here's the colonization-extinction model with an explicit model for occupancy in the first timestep. Depending on hardware, fitting this model might take 1-5 minutes.
multi_colex <- flock( f_occ = ~ uc1, f_det = ~ uc1 + ec1, f_col = ~ uc1, f_ex = ~ uc1, flocker_data = fd, multiseason = "colex", multi_init = "explicit", backend = "cmdstanr", cores = 4 )
Here's the colonization-extinction model using equilibrium occupancy probabilities in the first timestep:
multi_colex_eq <- flock( f_det = ~ uc1 + ec1, f_col = ~ uc1, f_ex = ~ uc1, flocker_data = fd, multiseason = "colex", multi_init = "equilibrium", backend = "cmdstanr", cores = 4 )
Here's the autologistic model with explicit occupancy probabilities in the
first timestep. To reflect the stereotypical autologistic model with a constant
logit-scale offset separating colonization and persistence probabilities, we
use the formula f_auto = ~ 1
, but it is fine to relax this constraint and use,
e.g. f_auto = ~ uc1
.
multi_auto <- flock( f_occ = ~ uc1, f_det = ~ uc1 + ec1, f_col = ~ uc1, f_auto = ~ 1, flocker_data = fd, multiseason = "autologistic", multi_init = "explicit", backend = "cmdstanr", cores = 4 )
And finally the autologistic model with equilibrium occupancy probabilities in the first timestep:
multi_auto_eq <- flock( f_det = ~ uc1 + ec1, f_col = ~ uc1, f_auto = ~ 1, flocker_data = fd, multiseason = "autologistic", multi_init = "equilibrium", backend = "cmdstanr", cores = 4 )
We can reconstruct the occupancy probabilities at every unit using the
get_Z
function:
Z <- get_Z(multi_colex_eq)
We can generate posterior predictions:
pp <- predict_flocker(multi_colex_eq)
We can compare the fit of these various models using approximate leave-one-out cross-validation, where the holdouts consist of entire series (leave-one-series-out cross-validation).
loo_out <- loo_compare_flocker( list(multi_colex, multi_colex_eq, multi_auto, multi_auto_eq) ) loo_out
See the main tutorial vignette for an example of the resulting output (this vignette does not actual run the models to save on computational resources).
{ width=30% style="border:none;" }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.