msocc is an R package for fitting and analyzing computationally efficient Bayesian multi-scale occupancy models. Its development was motivated by the use of environmental DNA (eDNA) for the monitoring of pathogens, parasites, and invasive species in the Yellowstone River. Consequently, the language in this package assumes this context. For more information on eDNA in this context, see Improved detection of rare, endangered and invasive trout in using a new largeāvolume sampling method for eDNA capture or Adding invasive species biosurveillance to the U.S. Geological Survey streamgage network.
This package is still under development, but can be installed through GitHub using the following code:
install.packages('devtools') # only needed if devtools is not currently installed devtools::install_github('StrattonCh/msocc')
Christian Stratton (christianstratton@montana.edu) developed this R package.
The heavy lifting for this package is done with msocc_mod
, which fits the models. The results of this function are then passed to posterior_summary
to numerically summarize the posterior distribution and cred_plot
to visually summarize it. Additionally, this package provides tools to simulate data from multi-scale occupancy models. This package has a companion R Shiny web application, available here.
To showcase the utility of this package, we begin with a simple example where there are 10 sites of interest, from which we collect 5 samples each and analyze 5 PCR replicates for the presence of the target DNA. To simulate data consistent with this structure, we use the following code.
library(msocc) set.seed(10022019)
sim <- msocc_sim(M = 10, J = 5, K = 5, psi = 0.8, theta = 0.75, p = 0.9) str(sim)
To generate these data, we have to specify all the occupancy parameters. Here, we chose to set the probability of presence at the site to be 0.8, the probability of occurence in the sample (conditional on presence at the site) to be 0.75, and the probability of detection in the replicate (conditional on occurence in the sample) to be 0.9. The simulated data take the following form:
head(sim$resp)
To fit a model to these data, we use the msocc_mod
function.
mod <- msocc_mod(wide_data = sim$resp, site = list(model = ~ 1, cov_tbl = sim$site), sample = list(model = ~ 1, cov_tbl = sim$sample), rep = list(model = ~ 1, cov_tbl = sim$rep), progress = F)
Within msocc_mod
, we need to specify the site
, sample
, and rep
lists which define the models to be fit at each level of the hierarchy. In this case, we are fitting only an intercept at each level which implies a constant probability of presence, occurence, and detection. For numerical summaries of this model, we can use posterior_summary
. First, an overall summary.
posterior_summary(mod, print = T)
By default, posterior_summary
returns only the unique combinations of psi, theta, and p, but always returns a row for each site. Based on these estimates, our model returned estimates consistent with the parameter values we used to generate the data. For a more in-depth look at each of the site, sample, and replicate levels and a description of uncertainty, we can specify the level
in posterior_summary
.
posterior_summary(mod, level = 'site', print = T) posterior_summary(mod, level = 'sample', print = T) posterior_summary(mod, level = 'rep', print = T)
Next, we consider the case where the sample level occurence probability is a function of covariates. First, we generate the data.
sim <- msocc_sim(M = 10, J = 20, K = 5, psi = 0.8, p = 0.9, sample.df = data.frame(site = rep(1:10, each = 20), sample = rep(1:20, 10), x = rnorm(200)), sample.mod = ~x, alpha = c(1,1)) str(sim)
To generate theta
as a function of covariates, we specify the data frame, model, and parameters needed to compute theta, defined as theta = exp(W %*% alpha) / (1 + exp(W %*% alpha))
. Next, we fit the model and provide the first six rows of a numerical summary of it.
mod <- msocc_mod(wide_data = sim$resp, site = list(model = ~1, cov_tbl = sim$site), sample = list(model = ~x, cov_tbl = sim$sample), rep = list(model = ~1, cov_tbl = sim$rep), progress = F) head(posterior_summary(mod, print = T))
From the output above, we can see that we now have a unique value of theta
for all 200 samples collected in this hypothetical experiment. Rather than looking through 200 rows of output to summarize the sample level occurence probabilities, we can plot these credibility intervals using cred_plot
. In the code below, n
controls how many samples are plotted on each plot.
cred.plots <- cred_plot(mod, level = 'sample', n = 20) gridExtra::grid.arrange(cred.plots[[1]], cred.plots[[2]], nrow = 2, ncol = 1)
These two graphics allow us to visualize the uncertainty in the posterior distribution of theta
for each sample from the first two sites. Did these credibility intervals capture the value of theta
that generated them in the simulation?
cred.plots <- cred_plot(mod, level = 'sample', n = 20, truth = sim$params$theta) gridExtra::grid.arrange(cred.plots[[1]], cred.plots[[2]], nrow = 2, ncol = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.