msocc is an R package for fitting and analyzing computationally efficient Bayesian multi-scale occupancy models. Its development was motivated by the use of environmental DNA (eDNA) for the monitoring of pathogens, parasites, and invasive species in the Yellowstone River. Consequently, the language in this package assumes this context. For more information on eDNA in this context, see Improved detection of rare, endangered and invasive trout in using a new largeā€volume sampling method for eDNA capture or Adding invasive species biosurveillance to the U.S. Geological Survey streamgage network.

Installation instructions

This package is still under development, but can be installed through GitHub using the following code:

install.packages('devtools') # only needed if devtools is not currently installed
devtools::install_github('StrattonCh/msocc')

Contact information

Christian Stratton (christianstratton@montana.edu) developed this R package.

Usage

The heavy lifting for this package is done with msocc_mod, which fits the models. The results of this function are then passed to posterior_summary to numerically summarize the posterior distribution and cred_plot to visually summarize it. Additionally, this package provides tools to simulate data from multi-scale occupancy models. This package has a companion R Shiny web application, available here.

Simulated examples

Constant psi, theta, and p

To showcase the utility of this package, we begin with a simple example where there are 10 sites of interest, from which we collect 5 samples each and analyze 5 PCR replicates for the presence of the target DNA. To simulate data consistent with this structure, we use the following code.

library(msocc)
set.seed(10022019)
sim <- msocc_sim(M = 10, J = 5, K = 5, psi = 0.8, theta = 0.75, p = 0.9)
str(sim)

To generate these data, we have to specify all the occupancy parameters. Here, we chose to set the probability of presence at the site to be 0.8, the probability of occurence in the sample (conditional on presence at the site) to be 0.75, and the probability of detection in the replicate (conditional on occurence in the sample) to be 0.9. The simulated data take the following form:

head(sim$resp)

To fit a model to these data, we use the msocc_mod function.

mod <- msocc_mod(wide_data = sim$resp,
                 site = list(model = ~ 1, cov_tbl = sim$site),
                 sample = list(model = ~ 1, cov_tbl = sim$sample),
                 rep = list(model = ~ 1, cov_tbl = sim$rep),
                 progress = F)

Within msocc_mod, we need to specify the site, sample, and rep lists which define the models to be fit at each level of the hierarchy. In this case, we are fitting only an intercept at each level which implies a constant probability of presence, occurence, and detection. For numerical summaries of this model, we can use posterior_summary. First, an overall summary.

posterior_summary(mod, print = T)

By default, posterior_summary returns only the unique combinations of psi, theta, and p, but always returns a row for each site. Based on these estimates, our model returned estimates consistent with the parameter values we used to generate the data. For a more in-depth look at each of the site, sample, and replicate levels and a description of uncertainty, we can specify the level in posterior_summary.

posterior_summary(mod, level = 'site', print = T)
posterior_summary(mod, level = 'sample', print = T)
posterior_summary(mod, level = 'rep', print = T)

Constant psi, theta as a function of covariates, constant p

Next, we consider the case where the sample level occurence probability is a function of covariates. First, we generate the data.

sim <- msocc_sim(M = 10, J = 20, K = 5, psi = 0.8, p = 0.9,
                 sample.df = data.frame(site = rep(1:10, each = 20),
                                        sample = rep(1:20, 10),
                                        x = rnorm(200)),
                 sample.mod = ~x,
                 alpha = c(1,1))
str(sim)

To generate theta as a function of covariates, we specify the data frame, model, and parameters needed to compute theta, defined as theta = exp(W %*% alpha) / (1 + exp(W %*% alpha)). Next, we fit the model and provide the first six rows of a numerical summary of it.

mod <- msocc_mod(wide_data = sim$resp,
                 site = list(model = ~1, cov_tbl = sim$site),
                 sample = list(model = ~x, cov_tbl = sim$sample),
                 rep = list(model = ~1, cov_tbl = sim$rep),
                 progress = F)
head(posterior_summary(mod, print = T))

From the output above, we can see that we now have a unique value of theta for all 200 samples collected in this hypothetical experiment. Rather than looking through 200 rows of output to summarize the sample level occurence probabilities, we can plot these credibility intervals using cred_plot. In the code below, n controls how many samples are plotted on each plot.

cred.plots <- cred_plot(mod, level = 'sample', n = 20)
gridExtra::grid.arrange(cred.plots[[1]], cred.plots[[2]], nrow = 2, ncol = 1)

These two graphics allow us to visualize the uncertainty in the posterior distribution of theta for each sample from the first two sites. Did these credibility intervals capture the value of theta that generated them in the simulation?

cred.plots <- cred_plot(mod, level = 'sample', n = 20, truth = sim$params$theta)
gridExtra::grid.arrange(cred.plots[[1]], cred.plots[[2]], nrow = 2, ncol = 1)


StrattonCh/msocc documentation built on Dec. 22, 2020, 2:51 a.m.