In r-glennie/openpopscr: Open Population Spatial Capture-Recapture

knitr::opts_chunk$set(
  eval = FALSE, 
  collapse = TRUE,
  fig.width = 7, 
  fig.height = 7, 
  comment = "#>"
)

In this vignette, I describe how Cormack-Jolly-Seber (CJS) spatial capture-recapture (SCR) models can be used within an R6 object of the CjsModel class. I recommend you read the vignettes on the ScrData and ScrModel classes before reading this one. The folder inst/examples/ in the repository contains example code.

library(openpopscr)
library(secr)

data("cjsmodel_example")

Simulating data

The openpopscr package uses the ScrData class to store data. See the vignette on the ScrData class to see how to create the data object from your own raw data. In this section, I simulate a CJS SCR survey. The simulation function is useful when you want to try out a new method where you know what the correct parameter estimates are, you want to test how robust a method is to violations of its assumptions, or you want to see how data simulated from a model compare with real data.

Cormack-Jolly-Seber models are used to describe the pattern of detections for individuals that are known to be alive from some occasion onwards. That is, one conditions on the first detection of the individual and estimates survival and detectability from the repeated detections after that. No inference is extended to the entire population other than the assumption that the survival probability for captured individuals is the same as for the population as a whole.

To simulate a survey, I require true detection and survival parameters, a detector layout, and a mesh.

# set true parameters 
true_par <- list(lambda0 = 0.5, sigma = 20, phi = 0.7)

# set abundance 
N <- 100

# make detectors array 
detectors <- make.grid(nx = 7, ny = 7, spacing = 20, detector = "count")

# make mesh 
mesh <- make.mask(detectors, buffer = 100, nx = 64, ny = 64, type = "trapbuffer")

# set number of occasions to simulate
n_occasions <- 5

To simulate the data, I set a random seed so that the results will be the same every time the code is run. This is so that you, the reader, can re-run the code and produce the same output as me. In practice, you may not want to set the seed if you want the output from the simulation to change randomly.

scrdat <- simulate_cjs_openscr(true_par, N, n_occasions, detectors, mesh, seed = 19539)

You will notice the simulate_cjs_openscr function produces useful output to let the user know what is going on. If you want to switch this off, for example if you are running lots of simulations and do not want this output being produced over and over, then you can set print = FALSE just as I have set the seed above.

In the simulated survey, r scrdat$n() unique individual were detected at least once. More summary information can be found by typing the name of the object into the console:

scrdat

Fitting a model

To create a CjsModel, you need to specify a formula for each detection parameter and the survival parameter; also, you need starting values for all parameters. It is best to choose starting values for the parameters that are biologically reasonable and match your data. You could simulate some data using proposed starting values as true parameters to see if they lead to similar data. Poor starting values can lead to spurious convergence in complicated models, models with many parameters. The function get_start_value can be used to produce a reasonable guess for each parameter.

# set each parameter to a be constant 
form <- list(lambda0 ~ 1, 
             sigma  ~ 1,
             phi ~ 1)

# get some starting values based on data 
start <- get_start_values(scrdat, model = "CjsModel")

# check starting values are reasonable, they seem ok 
start

Before fitting the model, I create the model object:

mod <- CjsModel$new(form, scrdat, start)

Once the model object has been created, you can fit the model using the fit function:

mod$fit()

If you have the output printed, the "checking convergence" message reports whether convergence has been reached and if it has not it reports the code output by optim that tells you why it failed to converge.

Reading model output

Once a model has been fit, you can type its name into the console and some summary results are printed:

mod

The parameters are reported on the link scale; this is because when covariates are involved, it is easier to interpret their effects. The package openpopscr uses the log-link function for all of these parameters. So, you would take the exponential of these reported numbers to obtain the parameter values on the response scale.

Alternatively, you can use the get_par function:

mod$get_par("lambda0", k = 1, j = 1)
mod$get_par("sigma", k = 1, j = 1)
mod$get_par("phi", k = 1)

I specify $k = 1, j = 1$ in the functions above for $\sigma, \lambda_0$ because by default get_par returns the value of these parameters for every occasion and detector. This is useful when they vary by occasion, but here they are constants, so I only want their value on a single occasion, here I chose the first occasion.

If you want to access the reported results, they are stored as a list which is returned by the function estimates:

mod$estimates()

The model AIC can be computed in the usual way:

AIC(mod)

Covariates

The detection parameters can change with observed covariate values. Covariates can change by detector or by occasion. At present, individual-level covariates are not implemented in openpopscr. By default, the covariate $t$ is created to allow for parameters to depend on occasion.

As an example, one can fit a model where survival changes depending on anthropogenic distriburance (measured by some continuous covariate):

# simulate a disturbance covariate (% deforestation for example)
set.seed(385638)
disturb <- runif(scrdat$n_occasions()) 
# add covariate 
scrdat$add_covariate("disturb", disturb, "k")
# copy formulae from previous model
form_surv <- form
# update formula for phi to include age 
form_surv[[3]]<- phi ~ disturb

# create new model object 
mod_surv <- CjsModel$new(form_surv, scrdat, start)
# fit model
mod_surv$fit()

# look at results
mod_surv
# predict phi for each occasion
mod_surv$get_par("phi")

Covariates that change over time, by detector, and by detector-time can be included in the same way. See ScrData and ScrModel vignettes for more details on loading covariates into the ScrData object and fitting models with covariates.

Time and Robust Design

There are two further arguments you can give when creating a ScrData object: time and primary.

The time option requires a vector that contains the start time of each occasion. The time unit is the user's choice. For example, if occasions occur in Weeks 1, 3, 4 of a particular month, this could be specified in a time vector as c(0, 3, 4) (in weeks) or c(0, 14, 21) (in days).

scrdat <- ScrData$new(capthist, mesh, time = time)

Once time is specified, the survival estimates will be the estimated probability of survival in one time unit, so it will depend on what time unit you used.

The primary argument relates to robust design surveys. In a robust design, there are both primary and secondary occasions for sampling. Secondary occasions take place close together in time and it is assumed the population is closed between them. Population change occurs between primary occasions. The survival parameter between primary occasions is of interest.

For ScrData, you supply a single capthist object (combined across all primary occasions) and then specify in the primary argument which primary each secondary occasion belongs to. For example, suppose a survey has 3 primary occasions with 2, 4, 4 secondary occasions respectively. The capthist object would then have 10 occasions in it and the primary vector would be c(1,1,2,2,2,2,3,3,3,3)

scrdat <- ScrData$new(capthist, mesh, primary = primary)

An example of a robust design analysis is included inst/examples/cjs_robust_example.R

Transience

For Cjs models, transience can play an important role. When individuals are transient, their activity can pass in and out of range of detection; individuals that permanently pass out of range of detection are best explained, by the CjsModel, as having died and so survival probability is underestimated. This forces one to estimate only apparent survival, not true survival. When transience is accounted for, one can more defendably state that they have estimated true survival.

You can simulate Cjs SCR surveys with transience:

# set truth 
true_par <- list(lambda0 = 1, sigma = 20, phi = 0.7, sd = 10)
# make detectors array 
detectors <- make.grid(nx = 7, ny = 7, spacing = 20, detector = "count")
# make mesh 
mesh <- make.mask(detectors, buffer = 100, nx = 64, ny = 64, type = "trapbuffer")
# set number of occasions to simulate
n_occasions <- 5
# set number of individuals tracked
N <- 100
# simulate ScrData 
scrtransdat <- simulate_cjs_openscr(true_par, 
                                    N, 
                                    n_occasions, 
                                    detectors, 
                                    mesh, 
                                    move = TRUE, 
                                    seed = 95811)

I will first fit a stationary CJSModel to these data:

# set formulae and start values 
stat_form <- list(lambda0 ~ 1, sigma ~ 1, phi ~ 1)
start <- get_start_values(scrtransdat, model = "CjsModel")

# create model object 
stat <- CjsModel$new(stat_form, scrtransdat, start)
# fit model 
stat$fit()

# look at results 
stat
# look at parameters on response scale 
stat$get_par("lambda0", k = 1, j = 1)
stat$get_par("sigma", k = 1, j = 1)
stat$get_par("phi", k = 1)

Given we know the truth, the estimates are a little off: $\lambda_0$ is far too low, $\sigma$ is too high, and $\phi$ is a litte low.

I now fit the transient model. The transient model class is called CjsTransientModel. We must also now specify a start value for the movement parameter.

# specify formulae for parameters 
form <- list(lambda0 ~ 1, 
             sigma  ~ 1,
             phi ~ 1,
             sd ~ 1)

start <- get_start_values(scrtransdat, model = "CjsTransientModel")

trans <- CjsTransientModel$new(form, scrtransdat, start)

I now fit the model as usual:

# fit model 
trans$fit()

# look at results 
trans
# look at parameters on response scale 
trans$get_par("lambda0", k = 1, j = 1)
trans$get_par("sigma", k = 1, j = 1)
trans$get_par("sd", k = 1)
trans$get_par("phi", k = 1)