This package implements a shiny user interface for multiple source capture recapture models. Multiple analysis types are supported, including: Log-linear models Bayesian model averaging * Bayesian latent class models.
Additionally, a power analysis calculator is provided for log-linear models.
The basic idea behind capture recapture modeling is to run two or more surveys (capture events) and record the individuals who participate. If many people participate in multiple events, it is likely that you have recruited a significant chunk of the population. On the other hand, if no one appears twice. then we would likely conclude that the population is much larger than the sample we collected.
There are two fundamental concerns that arise capture recapture:
List dependence is best dealt with using the Bayesian model averaging method, which will automatically detect potential dependencies.
The Log-linear and Bayesian latent class methods both provide mechanisms to deal with individual heterogeneity. These are covered in more detail below. However, including heterogeneity in your model (regardless of the method) will greatly increase the uncertainty of your estimates, and it is often difficult to determine which of the potential models to use.
If there is significant heterogeneity, the recommended approach is to stratify the data into subgroups with more uniform capture probability, calculate population size estimates for the subgroups, and add them together to get an overall estimate. For example, if age is highly related to probability of capture, split the data into young and old individuals. Then calculate the estimated population size for the young and old separately. The overall population size is then the sum of the two estimates.
This is a tutorial for the analysis capabilities of the shinyrecap package.
Data used:
Data reference:
Hickman, M., Higgins, V., Hope, V., Bellis, M., Tilling, K., Walker, A., & Henry, J. (2004). Injecting drug use in Brighton, Liverpool, and London: best estimates of prevalence and coverage of public health indicators. Journal of Epidemiology & Community Health, 58(9), 766-771.
To install the latest development version from the github repo run the following in R:
# If devtools is not installed:
# install.packages("devtools")
devtools::install_github("fellstat/shinyrecap")
library(shinyrecap)
# Launch the Shiny application for multiple source capture re-capture
launchShinyPopSize()
# Launch the power analysis application
launchShinyPopSize("power")
# Launch the an application to help format your data for analysis
launchShinyPopSize("convert")
The analysis application works with data in "Capture History" format. Each column should represent a capture event and each row should be a capture history. Optionally the last column may be the count of individuals with that particular capture history. "1" indicates capture, and "0" indicates not captured, so the history
0 1 0 1
represents an individual who was captured in the 2nd and 4th event, but not the 1st or 3rd. A properly structured 3 event CRC dataset would look something like:
V1 V2 V3 count
1 0 0 3
0 1 0 8
1 1 0 4
0 0 1 6
1 0 1 2
0 1 1 2
1 1 1 1
From the first row 1 0 0 3
, wee see that there were 3 individuals captured at event 1, but not at the 2nd and 3rd events. There was 1 individual captured in all 3 events (row: 1 1 1 1
).
In many studies, particularly those in the epidemiological domain, don't collect data in this format. Rather, at each event, individual participation is recorded, along with information about whether they were captured in previous events. Thus, a 3 event CRC study would have three datasets.
At the first event we count the number of participants
count
10
At the second event we count the number of participants, broken down by whether they participated in the first event.
V1 count
Yes 5
No 10
At the third event we count the number of participants, broken down by whether they participated in the first and/or second event.
V1 V2 count
Yes Yes 1
No Yes 2
Yes No 2
No No 6
Taken together, these three datasets are equivalent to the capture history dataset discussed above. shinyrecap
provides a user interface to help perform the conversion in format.
The first step toward performing an analysis is to load in the data. The application supports delimited text formats such as comma separated values (csv). Each column should represent a capture event and each row should be a capture history. Optionally the last column may be the count of individuals with that particular capture history.
This application supports four classes of log-linear models of varying model complexity.
When heterogeneity is present, there are different forms that this heterogeneity can take.
Selection between models is often done using either the AIC or BIC. For these, lower numbers indicate a "better" model fit.
Louis-Paul Rivest and Sophie Baillargeon (2014). Rcapture: Loglinear Models for Capture-Recapture Experiments. R package version 1.4-2. https://CRAN.R-project.org/package=Rcapture
This method allows the analyst to flexibly account for list dependency by creating models for all possible dependencies, and averaging over them in a way that is proportional to the probability that the dependence is correct.
The first step in the analysis is to formulate a prior for population size. This is represents the analyst's prior knowledge about population size along with uncertainty. By default, a "non-informative" prior is used.
Once the prior is specified, the probability distribution of the population size can be calculated.
Finally, it is useful to examine what dependencies exist in your data.
James Johndrow, Kristian Lum and Patrick Ball (2015). dga: Capture-Recapture Estimation using Bayesian Model Averaging. R package version 1.2. https://CRAN.R-project.org/package=dga
The Bayesian latent class model deals with heterogeneity in a novel way. It posits that there are unobserved subgroups in the data with different capture probabilities for each capture event. The number of these groups and their probabilities are unknown. The algorithm uses a Bayesian framework to estimates these, along with the population size.
The underlying algorithm is computationally quite complex and uses something called Markov Chain Monte Carlo (MCMC) to sample from the distribution of population size.
Daniel Manrique-Vallier (2017). LCMCR: Bayesian Non-Parametric Latent-Class Capture-Recapture. R package version 0.4.3. https://CRAN.R-project.org/package=LCMCR
This tab shows population estimates using each pair of capture events. It can be useful to look at this as a diagnostic step. Ideally each pair should have similar estimates of population size.
When planning a multiple source capture recapture study, it is useful to get an idea of how close your estimates of population size will be to their true value for given sample sizes. The power analysis lets you do this for log-linear modeling.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.