eoa: eoa - Evidence of Absence model estimation.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/eoa.R

Description

This routine estimates an EoA model. An EoA model consists of a log-linear for lambda, the mean number of search targets per "cell", where "cell" could be a turbine, season, year, etc. Inputs include information about the number of targets found (Y) and the g-values (=probatility of discovery) at all searched sites. The method is Bayesian and allows either an uniform prior for lambda or an informed prior. Estimation is performed using JAGS.

Usage

1
2
3
4
eoa(lambda, beta.params, data, offset, priors = NULL, conf.level = 0.9,
  nburns = 5e+05, niters = 20000, nthins = 10, nchains = 3,
  nadapt = 3000, quiet = FALSE, seeds = NULL,
  vagueSDMultiplier = 100)

Arguments

lambda

A model formula for the lambda parameters of EoA. This formulat has the form Y ~ X1 + X2 + etc. (exactly like lm). Here, Y is a Vector of number of carcasses found, one element per "cell". For example, if multiple sites are searched in a single season, elements of Y are the number of targets found at each site. If multiple sites are searched during multiple seasons, Y contains number of targets found each site X season combination (length of Y is number of sites times number of seasons, assuming no missing). Covariate vectors (or matricies) X1 etc. must have the same length as Y and can be anything that formula handles (e.g., factors, interactions, I(x^2), etc.)

beta.params

A data frame containing, at a minimum, two columns named $alpha and $beta. These are the alpha and beta parameters of a Beta distribution that determine the g-values in each "cell". Length of $alpha and $beta is either one, in which case g is assumed constant over "cells", or one element per "cell".

data

An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If variables are not found in data, the variables are taken from environment(formula), typically the environment from which eoa is called.

offset

An optional offset term(s) for the liner part of the model. This should be NULL or a numeric vector of length equal to the number of "cells". One or more offset terms can be included in the formula instead or as well (e.g., Y~ offset(log(a))+X), and if more than one is specified their sum is used. See model.offset. Due to the log link used here, offsets should generally be logged first. When the log of an offset is included, the linear part of the model is log(lambda/offset) ~ X1 + x2 + ...etc.

priors

An optional data frame specifying vague priors or containing information about (informed) prior distributions of coefficients in the lambda model.

If priors = NULL, vague normal priors are assumed for all coefficients. Coefficients are on a log scale. The vague normal priors chosen here have mean 0 and very large standard deviations. Check the output object to ensure the choosen standard deviations are sufficiently large enough to be considered vague.

If priors is not NULL, it should be a data frame containing at a minimum $mean and $sd specifying the mean and standard deviation of a normal prior for coefficients in the lambda model. Remember that coefficients are on the log scale, do not specify means on the response (i.e., count) scale. Provide log(count) means. The row.names of priors are matched to coefficients, and this provides a way to use informed priors for some coefficients and vague priors for others. For example, if X1 and X2 are both in the model and priors contains only one row with row.name == "X1", $mean and $sd values for that row are used to inform the coefficient of X1, but vague priors are used for all other coefficients. See examples.

conf.level

The confidence level for all posterior confidence intervals. Part of the value returned by this routine is $intervals containing posterior confidence intervals for all parameters. conf.level specifies the two-tailed confidence level for all these confidence intervals.

nburns

The number of burn-in steps to take during MCMC sampling.

niters

The number of sampling steps to take during MCMC sampling.

nthins

The amount of MCMC chain thinning to do. Every (nthins)-th sampling iteration in each chain is saved, while the rest are discarded. Each chain has niters iterations. In the end, a total of nchains*niters/nthins samples from the posterior are available to the user.

nchains

The number of MCMC sampling chains. Must specify 2 or more to check convergence.

nadapt

The number of adapting iterations to perform before burn-in. During adaptin, JAGS is trying to optimize it's proposal distribution and stepsize to increase convergence speed.

quiet

Logical indicating whether to print output during estimation. quiet==FALSE shows text based progress bars during estimation.

seeds

A vector of length nchains containing random number seeds for the MCMC sampler. If NULL, nchains random numbers are generated from R's base random number generator which is controled outside this routine using set.seed. Note that set.seed has no effect on the random number sequences used in JAGS because JAGS is a separate package. The seeds, whether chosen by this routine or specified, are stored in the output object.

Details

Observed quantities in the model are Y[i] = number of targets observed in cell i, and the covariates X[1i], X[2i], etc. Parameters in the model are M[i] = number of total targets in cell i including those observed and those missed, lambda[i] = the mean number of (or rate of) targets in cell i per offset[i], and g[i] = probability of observing a target in cell i.

If a log(offset) term is not included, lambda[i] is mean number of targets per cell. For example, if a cell is one season of monitoring at single sites, lambda is targets per season per site. If a cell is one season of monitoring at multiple sites (i.e., a facility), lambda is targets per season per facility. If a log(offset) term is included, lambda[i] is the cell-specific mean number of targets per offset unit. For example, if a cell contains counts from an entire wind power facility and the offset is log(number of turbines), lambda is number of targets per turbine by facility in the analysis. If the offset is log(days in season), lambda is number of targets per day for each cell. If two offset terms are included like offset(log(turbines)) + offset(log(days)), lambda is number of targets per turbine per day for the cell.

Parameter Mtot is the sum of all M parameters in the analysis. This derived parameters may not mean anything in specific problems, but is a fairly common derived parameter.

The EoA model implemented here is :

  1. Target count Y[i] is assumed to be binomial(M[i],g[i]).

  2. Binomial index M[i] is assumed to be poisson(lambda[i]).

  3. Binomial probability g[i] is assumed to be Beta(alpha[i],beta[i]).

  4. Poisson rate lambda[i] is constrained to be a log-linear function of covariaes. That is, log(lambda[i]) = A[0] + A[1]x[1i] + A[2]x[2i] + etc. If an log(offset) term is included, the model is log(lambda[i]/offset[i]) = A[0] + A[1]x[1i] + A[2]x[2i] + etc.

  5. Beta hyper-parameters alpha[i] and beta[i] are constants.

There are two ways to obtain the exact same results across multiple runs. One method is to specify seeds here. This will set the MCMC seeds in JAGS so that exact chains are reproduced. For example, if run1 is the result of a previous call, eoa(...,seeds=run1$seeds) will replicate run1 exactly. The second method is to use R's default set.seed just before calling this routine.

Value

An object of class "eoa". Eoa objects are a lists containing the following components:

Author(s)

Trent McDonald

See Also

labels.eoa, coef.eoa, predict.eoa, model.matrix.eoa

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# A 3 year study of 7 sites. 21 "cells". lambda change = 20/year
set.seed(9430834) # fixes Y and g of this example, but not the RNG's used in chains
ns <- 3

ny <- 7
g <- data.frame(
 alpha = rnorm(ns*ny,70,2),
 beta = rnorm(ns*ny,700,25)
)
Y <- rbinom(ns*ny, c(rep(20,ny), rep(40,ny), rep(60,ny)), g$alpha/(g$alpha+g$beta))

df <- data.frame(year=factor(c(rep("2015",ny),rep("2016",ny),rep("2017",ny))),
   Year=c(rep(1,ny),rep(2,ny),rep(3,ny)))

# Uninformed eoa (use low number of iterations because it's and example)
eoa.1 <- eoa(Y~year, g, df, nburn = 1000, niters= 50*10, nthins = 10 )

# Repeat and get exact same answers
eoa.1 <- eoa(Y~year, g, df, nburn = 1000, niters= 50*10, nthins = 10, seeds=eoa.1$seeds )

# Run Informed EoA.  Assume prior annual lambda estimates
# are 10, 15, and 20, all with sd=.5.
prior <- data.frame(mean=c(log(10),log(15/10),log(20/10)),
  sd=c(0.5,0.5,0.5))
row.names(prior) <- c("(Intercept)","year2016","year2017")

ieoa <- eoa(Y~year, g, df, priors=prior, nburn = 1000, niters= 50*10, nthins = 10 )

# The above chains do not converge due to the low number of iterations used here.
# To check convergence and autocorrelation

gelman.diag(ieoa$out) # gelmanStats
gelman.plot(ieoa$out) # gelmanPlot

# Nice traceplots
library(lattice)
plot(ieoa$out) # tracePlot, all parameters
xyplot(ieoa$out[,c("(Intercept)","year2016","year2017")]) # nicer trace of coefficients

# Autocorrelation functions
acfplot(ieoa$out[,c("(Intercept)","year2016","year2017")], ylim=c(-.2,1), lag.max=300)
acfplot(ieoa$out[,c("(Intercept)","year2016","year2017")], ylim=c(-.2,1), thin=2)

# Density plots
densityplot(ieoa$out[,c("(Intercept)","year2016","year2017")])
densityplot(ieoa$out[,c("lambda[1]","lambda[8]","lambda[15]")]) # Mean lambda each year

# Correlation among coefficients
levelplot(ieoa$out[,c("(Intercept)","year2016","year2017")][[1]])

# QQ plots
qqmath(ieoa$out[,c("(Intercept)","year2016","year2017")])

tmcd82070/evoab documentation built on May 13, 2020, 11:25 p.m.