knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )  # Summary {-} We describe an R package called marked for analysis of mark-recapture data. Currently, the package is capable of fitting Cormack-Jolly-Seber (CJS) models with maximum likelihood estimation (MLE) and Bayesian Markov Chain Monte Carlo (MCMC) methods. In addition, Jolly-Seber (JS) models can be fitted with MLE. Additional models will be added as the package is updated. We provide some example analyses with the well-known dipper data and compare run timing and results with MARK. For analysis with several thousand or more capture histories and several time-varying covariates, the run times with marked are substantially less. By providing this open source software, we hope to expand the analyst's toolbox and enble the knowledgeable user to understand fully the models and software. # Introduction {-} Currently the most comprehensive software for analysis of capture-recapture data is MARK [@White1999]. MARK is a FORTRAN program for fitting capture-recapture models that are manually constructed with a graphical user interface. RMark [@Laake2008] is an R package that constructs models for MARK with user-specified formulas to replace the manual model creation. With RMark and MARK most currently available capture-recapture models can be fitted and manipulated in R. Additional R packages for analysis of capture-recapture data have been made available including Rcapture [@Baillargeon2007], mra [@McDonald2005], secr [@Borchers2008], BTSPAS [@Schwarz2009], SPACECAP [@Royle2009], and BaSTA [@Colchero2012]. Rcapture fits closed and open models in a log-linear framework. The mra package fits Cormack-Jolly-Seber (CJS) and the Huggins closed model with a regression approach'' to model specification. The secr and SPACECAP packages provide spatially explicit modeling of closed capture-recapture data and BTSPAS fits time-stratified Petersen models in a Bayesian framework. BaSTA estimates survival with covariates from capture-recapture/recovery data in a Bayesian framework when many individuals are of unknown age. Each package is designed for a unique niche or structure. We believe these alternative packages in R are useful because they expand the analyst's toolbox and the code is open source which enables the knowledgeable user to understand fully what the software is doing. Also, this independent innovation provides testing for existing software and potential gains in developer knowledge to improve existing software. We developed this R package we named marked for analysis with marked animals in contrast to the R package unmarked [@Fiske2011] that focuses on analysis with unmarked animals. The original impetus for the package was to implement the CJS model using the hierarchical likelihood construction described by @Pledger2003 and to improve on execution times with RMark/MARK [@White1999;@Laake2008] for analysis of our own large data sets with many time-varying individual (animal-specific) covariates. Subsequently, we implemented the Jolly-Seber model with the @Schwarz1996 POPAN structure where the hierarchical likelihood construction idea extended to the entry of animals into the population. We also added a Bayesian Markov Chain Monte Carlo (MCMC) implementation of the CJS model based on the approach used by @ALBERT:1993fk for analyzing binary data with a probit regression model. # Background {-} We assume you know and understand mark-recapture terminology and models. For background material on capture-recapture and MARK refer to http://www.phidot.org/software/mark/docs/book/. In the appendix, we provide a comparison of the MARK and marked data and model structures and details on likelihood construction for the CJS and JS models. Our focus in the vignette will be to provide examples on the use of marked. # Dipper data example {-} Anyone who has used RMark will find it easy to use marked because it has nearly identical syntax and structure with a few minor differences. The structure of the input dataframe for marked is also identical to RMark. Any dataframe with a character field named ch containing the capture history will work with marked. If the capture history represents more than one animal then the dataframe should contain the additional numeric field named freq, which is the number of animals represented by that capture history. However, it is not necessary to accumulate capture histories in the input data because the process.data step will accumulate duplicated records by default. There can be any number of additional fields, some of which can be used as covariates in the analysis. Some of the functions in marked and RMark have the same name (e.g., process.data, make.design.data), so only one of the packages should be loaded in R to avoid aliasing and resulting errors. We will start by fitting the simplest CJS model with constant$\phi$and$p$. The dipper data contain 294 records with the capture history (ch) and a factor variable sex with values Female or Male. Models are fitted with the function crm ([c]apture-[r]ecapture [m]odel). After a call to library to attach the package, and data to retrieve the dipper data from the package, the model is fitted with crm and assigned to the object named model: library(marked) library(ggplot2) data(dipper) model=crm(dipper)  For this example, we are using the default CJS model and a default formula of ~ 1 (constant) for each parameter. The function crm calls three functions in turn: 1) process.data to process the data, 2) make.design.data to create the list of parameter-specific dataframes, and 3) cjs, to fit the CJS model with the defined formulas. The code reports progress and some results as it executes. We'll suppress these messages in later examples but show them here to explain some of the messages. When it processes the data, it collapses the 294 rows into 55 which are the unique number of rows in the data including ch and sex. After processing, it creates the design data and the design matrices for each parameter. Prior to the optimization, it also collapses the histories further from 55 to 32 which it can do here because sex is not used in either formula. The final steps are to compute initial values for the parameters and find the MLEs with the selected optimization method(s). As it progresses, the value of -2log-likelihood is reported every 100 evaluations of the objective function. It is not shown above because there were fewer than 100 function evaluations for this example. A brief listing of the model results is obtained by typing model which invokes the function print.crm because class(model)="crm". model  The output includes the number of parameters (npar), -2log-likelihood, Akaike's Information Criterion (AIC), and the estimates for$\phi$and$p$. Estimates of precision are not shown because the default is hessian=FALSE and it must be set to TRUE to get estimates of precision. This default was chosen because the marked package does not count parameters from the hessian, so there is no need to compute it for each model as with MARK. The hessian may never be needed if the model is clearly inferior. Also, this allows the model to be fitted again from the final or different estimates to check for convergence without the penalty of computing the hessian at the final values each time. Separate functions (cjs.hessian and js.hessian) are provided to compute and store in the model the variance-covariance matrix from the hessian at the final estimates as shown below: model=cjs.hessian(model) model  Once the hessian has been computed, printing the model will display the standard errors and 95% normal confidence intervals for the parameter estimates on the link scale (e.g., logit for$\phi$and$p$). You can set hessian=TRUE in the call to crm if you want to compute it when the model is fitted. You'll never fit only one model to data, so the most efficient approach is to call process.data and make.design.data separately and pass the results to crm so they can be used for each fitted model as shown below: dipper.proc=process.data(dipper) dipper.ddl=make.design.data(dipper.proc) Phi.sex=list(formula=~sex) model=crm(dipper.proc,dipper.ddl,model.parameters=list(Phi=Phi.sex), accumulate=FALSE)  Collapsing capture history records is controlled by the accumulate arguments in process.data and crm. In the above example, accumulate was TRUE for the process.data step but it was turned off for the model fit because it would have not resulted in any accumulation with the sex term in the model. Typically the default values are optimal but if you are fitting many models and most are complex models, then it may save time to accumulate in the process.data step but not for the model fitting step. If you fit more than a few models, use crm.wrapper rather than crm. It fits a set of models and returns a list with a model selection table that summarizes the fit of all the models. By default, crm.wrapper stores the model results externally and in the list it only stores the names of the files containing the models. If you set external=FALSE, then it will store the model results in the list as shown in the example below. dipper.proc=process.data(dipper) dipper.ddl=make.design.data(dipper.proc) fit.models=function() { Phi.sex=list(formula=~sex) Phi.time=list(formula=~time) p.sex=list(formula=~sex) p.dot=list(formula=~1) cml=create.model.list(c("Phi","p")) results=crm.wrapper(cml,data=dipper.proc, ddl=dipper.ddl, external=FALSE,accumulate=FALSE) return(results) } dipper.models=fit.models()  The model selection table is displayed with: dipper.models  A non-zero value for convergence means the model did not converge. If the models are not stored externally, an individual model can be extracted from the list with either the model number which is listed in the model table or with the model name which is the model formula specifications pasted together as shown below: dipper.models[[1]] dipper.models[["Phi.sex.p.dot"]]  If the models are stored externally (by setting external=TRUE in the crm.wrapper call), they can be retrieved with the function load.model as shown below: dipper.proc=process.data(dipper) dipper.ddl=make.design.data(dipper.proc) fit.models=function() { Phi.sex=list(formula=~sex) Phi.time=list(formula=~time) p.sex=list(formula=~sex) p.dot=list(formula=~1) cml=create.model.list(c("Phi","p")) results=crm.wrapper(cml,data=dipper.proc, ddl=dipper.ddl, external=TRUE,accumulate=FALSE, replace=TRUE) return(results) } dipper.models=fit.models()  model=load.model(dipper.models[[1]]) model  To make the analysis more interesting, we will add some covariates for$\phi$and p. For$\phi$, we will add a static covariate weight which is a random value between 1 and 10. For$\phi$, we also add a time-varying covariate Flood which is the same for all dippers but varies by time with a 0 value for times 1,4,5,6 and a value of 1 for times 2 and 3. For p, we will add a time-varying individual covariate td (trap dependence) which is the 0/1 value of the capture from the previous occasion. Static covariates are entered in the dataframe in a single column and time-varying covariates have a column and name for each occasion with the appropriate time appended at the end of each name. In this case, Flood will be Flood1,...,Flood6 and for td it will be td2,...,td7 because time for$\phi$is based on the time at the beginning of the interval and for p it is the time for the capture occasion. Below the names of the covariates in the dataframe are shown after they are created: data(dipper) # Add a dummy weight field which are random values from 1 to 10 set.seed(123) dipper$weight=round(runif(nrow(dipper),0,9),0)+1
Flood=matrix(rep(c(0,1,1,0,0,0),each=nrow(dipper)),ncol=6)
colnames(Flood)=paste("Flood",1:6,sep="")
dipper=cbind(dipper,Flood)
# Add td covariate, but exclude first release as a capture
# splitCH and process.ch are functions in the marked package
td=splitCH(dipper$ch) td=td[,1:6] releaseocc=process.ch(dipper$ch)$first releaseocc=cbind(1:length(releaseocc),releaseocc) releaseocc=releaseocc[releaseocc[,2]<nchar(dipper$ch[1]),]
td[releaseocc]=0
colnames(td)=paste("td",2:7,sep="")
dipper=cbind(dipper,td)
# show names
names(dipper)


Next we process the data with the default CJS model and make the design data with parameters that identify which covariates to use for each parameter. By default, each covariate in the dataframe is added to the design data of each parameter but the argument static can be used to limit the variables appended and the time-varying argument is needed to identify which covariates vary by time and should be collapsed into a single column. If we did not include the static argument for $\phi$, then weight and each of the td columns would also be included and for p, weight and each of the Flood columns would be included. If td was added to the time-varying argument for $\phi$ and Flood was added for p, then we would have also needed to add td1 and Flood7 due to the different time labels for $\phi$ and p. We have no intention to use those variables so it is best to specify what is needed. After specifying the design lists for $\phi$ and p, they are used in the call to make.design.data and the resulting dataframe names are shown below:

# Process data
dipper.proc=process.data(dipper)
# Create design data with static and time varying covariates
design.Phi=list(static=c("weight"),time.varying=c("Flood"))
design.p=list(static=c("sex"),time.varying=c("td"),
age.bins=c(0,1,20))
design.parameters=list(Phi=design.Phi,p=design.p)
ddl=make.design.data(dipper.proc,parameters=design.parameters)
names(ddl$Phi) names(ddl$p)


Next we define the models for $\phi$ and $p$ that we want to fit and call crm.

Phi.sfw=list(formula=~Flood+weight)
p.ast=list(formula=~age+sex+td)
model=crm(dipper.proc,ddl,hessian=TRUE,
model.parameters=list(Phi=Phi.sfw,p=p.ast))


Below we create a range of data values to compute predicted $\phi$ values and then plot the results for Flood and non-Flood years for a range of weights. Not surprising that the slope for weight is nearly 0 because the weight values were generated randomly.

newdipper=expand.grid(sex=c("Female","Male"),weight=1:10,Flood1=0,
Flood2=1,Flood3=1,Flood4=0,Flood5=0,Flood6=0,td2=0,td3=c(0,1),
td4=c(0,1),td5=c(0,1),td6=c(0,1),td7=c(0,1))
reals=predict(model,newdata=newdipper,se=TRUE)
reals$Phi$Flood=factor(reals$Phi$Flood,labels=c("Non-flood","Flood"))

## Try the marked package in your browser

Any scripts or data that you put into this service are public.

marked documentation built on Dec. 9, 2019, 9:06 a.m.