eoa: eoa - Evidence of Absence model estimation.
In tmcd82070/evoab: Evidence of Absence (EoA)

Description Usage Arguments Details Value Author(s) See Also Examples

This routine estimates an EoA model. An EoA model consists of a log-linear for lambda, the mean number of search targets per "cell", where "cell" could be a turbine, season, year, etc. Inputs include information about the number of targets found (Y) and the g-values (=probatility of discovery) at all searched sites. The method is Bayesian and allows either an uniform prior for lambda or an informed prior. Estimation is performed using JAGS.

eoa(lambda, beta.params, data, offset, priors = NULL, conf.level = 0.9,
  nburns = 5e+05, niters = 20000, nthins = 10, nchains = 3,
  nadapt = 3000, quiet = FALSE, seeds = NULL,
  vagueSDMultiplier = 100)

`lambda`	A model formula for the lambda parameters of EoA. This formulat has the form `Y ~ X1 + X2` + etc. (exactly like `lm`). Here, `Y` is a Vector of number of carcasses found, one element per "cell". For example, if multiple sites are searched in a single season, elements of Y are the number of targets found at each site. If multiple sites are searched during multiple seasons, `Y` contains number of targets found each site X season combination (length of `Y` is number of sites times number of seasons, assuming no missing). Covariate vectors (or matricies) `X1` etc. must have the same length as `Y` and can be anything that `formula` handles (e.g., factors, interactions, `I(x^2)`, etc.)
`beta.params`	A data frame containing, at a minimum, two columns named `$alpha` and `$beta`. These are the alpha and beta parameters of a Beta distribution that determine the g-values in each "cell". Length of `$alpha` and `$beta` is either one, in which case g is assumed constant over "cells", or one element per "cell".
`data`	An optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If variables are not found in data, the variables are taken from environment(formula), typically the environment from which `eoa` is called.
`offset`	An optional offset term(s) for the liner part of the model. This should be NULL or a numeric vector of length equal to the number of "cells". One or more offset terms can be included in the formula instead or as well (e.g., Y~ offset(log(a))+X), and if more than one is specified their sum is used. See `model.offset`. Due to the log link used here, offsets should generally be logged first. When the log of an offset is included, the linear part of the model is log(lambda/offset) ~ X1 + x2 + ...etc.
`priors`	An optional data frame specifying vague priors or containing information about (informed) prior distributions of coefficients in the lambda model. If `priors` = NULL, vague normal priors are assumed for all coefficients. Coefficients are on a log scale. The vague normal priors chosen here have mean 0 and very large standard deviations. Check the output object to ensure the choosen standard deviations are sufficiently large enough to be considered vague. If `priors` is not NULL, it should be a data frame containing at a minimum `$mean` and `$sd` specifying the mean and standard deviation of a normal prior for coefficients in the lambda model. Remember that coefficients are on the log scale, do not specify means on the response (i.e., count) scale. Provide log(count) means. The `row.names` of `priors` are matched to coefficients, and this provides a way to use informed priors for some coefficients and vague priors for others. For example, if `X1` and `X2` are both in the model and `priors` contains only one row with `row.name` == "X1", `$mean` and `$sd` values for that row are used to inform the coefficient of `X1`, but vague priors are used for all other coefficients. See examples.
`conf.level`	The confidence level for all posterior confidence intervals. Part of the value returned by this routine is `$intervals` containing posterior confidence intervals for all parameters. `conf.level` specifies the two-tailed confidence level for all these confidence intervals.
`nburns`	The number of burn-in steps to take during MCMC sampling.
`niters`	The number of sampling steps to take during MCMC sampling.
`nthins`	The amount of MCMC chain thinning to do. Every (`nthins`)-th sampling iteration in each chain is saved, while the rest are discarded. Each chain has `niters` iterations. In the end, a total of `nchains*niters/nthins` samples from the posterior are available to the user.
`nchains`	The number of MCMC sampling chains. Must specify 2 or more to check convergence.
`nadapt`	The number of adapting iterations to perform before burn-in. During adaptin, JAGS is trying to optimize it's proposal distribution and stepsize to increase convergence speed.
`quiet`	Logical indicating whether to print output during estimation. `quiet==FALSE` shows text based progress bars during estimation.
`seeds`	A vector of length `nchains` containing random number seeds for the MCMC sampler. If NULL, `nchains` random numbers are generated from R's base random number generator which is controled outside this routine using `set.seed`. Note that `set.seed` has no effect on the random number sequences used in JAGS because JAGS is a separate package. The seeds, whether chosen by this routine or specified, are stored in the output object.

Observed quantities in the model are Y[i] = number of targets observed in cell i, and the covariates X[1i], X[2i], etc. Parameters in the model are M[i] = number of total targets in cell i including those observed and those missed, lambda[i] = the mean number of (or rate of) targets in cell i per offset[i], and g[i] = probability of observing a target in cell i.

If a log(offset) term is not included, lambda[i] is mean number of targets per cell. For example, if a cell is one season of monitoring at single sites, lambda is targets per season per site. If a cell is one season of monitoring at multiple sites (i.e., a facility), lambda is targets per season per facility. If a log(offset) term is included, lambda[i] is the cell-specific mean number of targets per offset unit. For example, if a cell contains counts from an entire wind power facility and the offset is log(number of turbines), lambda is number of targets per turbine by facility in the analysis. If the offset is log(days in season), lambda is number of targets per day for each cell. If two offset terms are included like offset(log(turbines)) + offset(log(days)), lambda is number of targets per turbine per day for the cell.

Parameter Mtot is the sum of all M parameters in the analysis. This derived parameters may not mean anything in specific problems, but is a fairly common derived parameter.

The EoA model implemented here is :

Target count Y[i] is assumed to be binomial(M[i],g[i]).
Binomial index M[i] is assumed to be poisson(lambda[i]).
Binomial probability g[i] is assumed to be Beta(alpha[i],beta[i]).
Poisson rate lambda[i] is constrained to be a log-linear function of covariaes. That is, log(lambda[i]) = A[0] + A[1]x[1i] + A[2]x[2i] + etc. If an log(offset) term is included, the model is log(lambda[i]/offset[i]) = A[0] + A[1]x[1i] + A[2]x[2i] + etc.
Beta hyper-parameters alpha[i] and beta[i] are constants.

There are two ways to obtain the exact same results across multiple runs. One method is to specify seeds here. This will set the MCMC seeds in JAGS so that exact chains are reproduced. For example, if run1 is the result of a previous call, eoa(...,seeds=run1$seeds) will replicate run1 exactly. The second method is to use R's default set.seed just before calling this routine.

An object of class "eoa". Eoa objects are a lists containing the following components:

estimates : a matrix containing parameter estimates and standard errors. This matrix contains one row per parameter and two columns. rownames(x$estimates) lists the parameters. Columns are Estimate, containing the median value of of that parameter's posterior marginal, and SD, containing the standard deviation of the parameter's posterior marginal. The following parameters are included:
- M : Mortality estimates in individual "cells". For example, M[4] is the estimate (median of posterior) of mortalities at the site represented in cell 4 of the data set.
- Mtot : A derived parameter, the median of the posterior distribution of the sum of all M's.
- lambda : Mortality rates, potentially averaged over multiple cells in the problem. Lambda's average over cells according to the model, M's do not. M's apply to single cells.
- coefficients : One or more coefficients in the log model for lambda.
intervals : a matrix containing two-tailed posterior confidence intervals. Same dimension as estimates but columns are lower and upper endpoints of the intervals. Confidence level of all intervals is stored in conf.level.
out : a mcmc.list object containing the output of the JAGS runs. This is the raw output from the coda.samples function of the coda package and can be used to check convergence, etc. This object can be subsetted to chains for one or more parameter by thinking of the list as a 2-D matrix and using named dimensions. For example, x$out[,c("(Intercept)","year")] produces an mcmc.list object containing only chains for parameters named "(Intercept)" and "year". See examples for some useful plots.
jags.model : a jags.model object used to estimate the model. This object can be used to compute additional convergence and model fit statistics, such as DIC (see rjags::dic.samples).
priors : the mean and standard deviation of the normal prior distributions for all coefficients. This differs from the input priors parameter because matching of terms and row names has been done. There is exactly one row in the output priors for every parameter, while that is not necessarily true for the input.
seeds : the vector of MCMC random number seeds actually used in the analysis. Specify these as input seeds to repeat, exactly, the MCMC sampling.
offset : the vector of offset terms used in the linear part of the model. If no offset terms were specified, this vector contains zeros.
coef.labels : character vector containing the labels of coefficients in the model. This is used to distinguish coefficients from derived parameters. All parameters in ests not listed here are considered derived. See labels.eoa, and coef.eoa.
call : the call that invoked this function.
data : the input data set. This can be handy for prediction and model selection.
converged : Logical value for whether this routine thinks the MCMC chain has converged. The MCMC sampling is deemed to have converged if all Gelman R-hats are less than some criterion. This function simply calls checkIsConverged with an mcmc.list object containing coefficients only and a criterion of 1.1. Convergence checking is only done on lambda model coefficients.
Rhats : Gelman's R-hats for every parameter. This is one of the output components of checkIsConverged.
autoCorrelated : Logical value indicating whether this routine thinks the MCMC is autocorrelated. Normally, one does not want autocorrelation in the final chains. This function simply calls checkIsAutocorrelated with the mcmc.list and criterion=0.4 and lag=2.
autoCorrs : a vector of autocorrelations used the judge whether the chains are autocorrelated. This is a vector with same length as number of coefficients, and is the second component of the output of checkIsAutocorrelated. Only model coefficients are checked for autocorrelation.
conf.level : the two-tailed confidence level for all confidence intervals in intervals. Because the entire mcmc.list is returned, one can change confidence levels without re-running by executing apply(as.matrix(x$out), 2, quantile, p=c(<new quantiles>))

Trent McDonald

labels.eoa, coef.eoa, predict.eoa, model.matrix.eoa

# A 3 year study of 7 sites. 21 "cells". lambda change = 20/year
set.seed(9430834) # fixes Y and g of this example, but not the RNG's used in chains
ns <- 3

ny <- 7
g <- data.frame(
 alpha = rnorm(ns*ny,70,2),
 beta = rnorm(ns*ny,700,25)
)
Y <- rbinom(ns*ny, c(rep(20,ny), rep(40,ny), rep(60,ny)), g$alpha/(g$alpha+g$beta))

df <- data.frame(year=factor(c(rep("2015",ny),rep("2016",ny),rep("2017",ny))),
   Year=c(rep(1,ny),rep(2,ny),rep(3,ny)))

# Uninformed eoa (use low number of iterations because it's and example)
eoa.1 <- eoa(Y~year, g, df, nburn = 1000, niters= 50*10, nthins = 10 )

# Repeat and get exact same answers
eoa.1 <- eoa(Y~year, g, df, nburn = 1000, niters= 50*10, nthins = 10, seeds=eoa.1$seeds )

# Run Informed EoA.  Assume prior annual lambda estimates
# are 10, 15, and 20, all with sd=.5.
prior <- data.frame(mean=c(log(10),log(15/10),log(20/10)),
  sd=c(0.5,0.5,0.5))
row.names(prior) <- c("(Intercept)","year2016","year2017")

ieoa <- eoa(Y~year, g, df, priors=prior, nburn = 1000, niters= 50*10, nthins = 10 )

# The above chains do not converge due to the low number of iterations used here.
# To check convergence and autocorrelation

gelman.diag(ieoa$out) # gelmanStats
gelman.plot(ieoa$out) # gelmanPlot

# Nice traceplots
library(lattice)
plot(ieoa$out) # tracePlot, all parameters
xyplot(ieoa$out[,c("(Intercept)","year2016","year2017")]) # nicer trace of coefficients

# Autocorrelation functions
acfplot(ieoa$out[,c("(Intercept)","year2016","year2017")], ylim=c(-.2,1), lag.max=300)
acfplot(ieoa$out[,c("(Intercept)","year2016","year2017")], ylim=c(-.2,1), thin=2)

# Density plots
densityplot(ieoa$out[,c("(Intercept)","year2016","year2017")])
densityplot(ieoa$out[,c("lambda[1]","lambda[8]","lambda[15]")]) # Mean lambda each year

# Correlation among coefficients
levelplot(ieoa$out[,c("(Intercept)","year2016","year2017")][[1]])

# QQ plots
qqmath(ieoa$out[,c("(Intercept)","year2016","year2017")])