eoar: eoar - Evidence of Absence Regression model estimation.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/eoar.R

Description

This routine estimates an EoAR model. An EoAR model consists of a log-linear model for lambda, the mean number of search targets per "cell", where "cell" is a measured experimental unit such as a turbine, season, year, etc. Inputs include information about the number of targets found (Y) and the g-values (=probatility of discovery) at all searched sites. The method is Bayesian and allows either an uniform prior for lambda or an informed prior. Estimation is performed in JAGS.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
eoar(
  lambda,
  beta.params,
  data,
  offset,
  priors = NULL,
  conf.level = 0.9,
  nburns = 5e+05,
  niters = 20000,
  nthins = 10,
  nchains = 3,
  nadapt = 3000,
  quiet = FALSE,
  seeds = NULL,
  vagueSDMultiplier = 100
)

Arguments

lambda

A model formula for the lambda parameters of EoAR. This formulat has the form Y ~ X1 + X2 + etc. (exactly like lm). Here, Y is a vector containing number of carcasses found, one element per "cell". For example, if multiple sites are searched in a single season, elements of Y should be the number of targets found at each site. If multiple sites are searched during multiple seasons, Y should contain the number of targets found at each site X season combination. Length of Y is number of sites times number of seasons, assuming no missing. Covariate vectors (or matricies) X1 etc. must have the same length as Y and can be anything that formula handles (e.g., factors, interactions, I(x^2), etc.)

beta.params

A data frame containing, at a minimum, two columns named $alpha and $beta. These are the alpha and beta parameters of a Beta distribution that determine the g-values in each "cell". Length of $alpha and $beta is either one, in which case g is assumed constant over "cells", or one element per "cell".

data

An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing all variables in the model. If variables are not found in data, the variables are taken from environment(formula), typically the environment from which eoar is called.

offset

An optional offset term(s) for the linear part of the model. This should be NULL or a numeric vector of length equal to the number of "cells". One or more offset terms can be included in the formula (e.g., Y~ offset(log(a))+X), and if more than one is specified their sum is used. See model.offset. Due to the log link used here, offsets should generally be logged first. When the log of an offset is included, the linear part of the model can be read as log(lambda/offset) ~ X1 + x2 + ...etc.

priors

An optional data frame specifying vague priors or containing information about (informed) prior distributions of coefficients in the lambda model.

If priors = NULL, vague normal priors are assumed for all coefficients. Coefficients are on a log scale. The vague normal priors chosen here have mean 0 and very large standard deviations. Check the output object to ensure the choosen standard deviations are sufficiently large enough to be considered vague.

If priors is not NULL, it should be a data frame containing at a minimum $mean and $sd specifying the mean and standard deviation of a normal prior for coefficients in the lambda model. Remember that coefficients are on the log scale, do not specify means on the response (i.e., count) scale. Provide log(count) means. The row.names of priors are matched to coefficients, and this provides a way to use informed priors for some coefficients and vague priors for others. For example, if X1 and X2 are both in the model and priors contains only one row with row.name == "X1", $mean and $sd values for that row are used to inform the coefficient of X1, but vague priors are used for all other coefficients. See examples.

conf.level

The confidence level for all posterior confidence intervals. Part of the value returned by this routine is $intervals containing posterior confidence intervals for all parameters. conf.level specifies the two-tailed confidence level for all these confidence intervals.

nburns

The number of burn-in steps to take during MCMC sampling.

niters

The number of sampling steps to take during MCMC sampling.

nthins

The amount of MCMC chain thinning to do. Every (nthins)-th sampling iteration in each chain is saved, while the rest are discarded. Each chain has niters iterations. In the end, a total of nchains*niters/nthins samples from the posterior are available to the user.

nchains

The number of MCMC sampling chains. Must specify 2 or more to check convergence.

nadapt

The number of adapting iterations to perform before burn-in. During adaptin, JAGS is trying to optimize it's proposal distribution and stepsize to increase convergence speed.

quiet

Logical indicating whether to print output during estimation. quiet==FALSE shows text based progress bars during estimation.

seeds

A vector of length nchains containing random number seeds for the MCMC sampler. If NULL, nchains random numbers are generated from R's base random number generator which is controled outside this routine using set.seed. Note that set.seed has no effect on the random number sequences used in JAGS because JAGS is a separate package. The seeds, whether chosen by this routine or specified, are stored in the output object.

Details

Observed quantities in the model are Y[i] = number of targets observed in cell i, and the covariates X[1i], X[2i], etc. Parameters in the model are M[i] = number of total targets in cell i including those observed and those missed, lambda[i] = the mean number of (or rate of) targets in cell i per offset[i], and g[i] = probability of observing a target in cell i.

If a log(offset) term is not included, lambda[i] is mean number of targets per cell. For example, if a cell is one season of monitoring at single sites, lambda is targets per season per site. If a cell is one season of monitoring at multiple sites (i.e., a facility), lambda is targets per season per facility. If a log(offset) term is included, lambda[i] is the cell-specific mean number of targets per offset unit. For example, if a cell contains counts from an entire wind power facility and the offset is log(number of turbines), lambda is number of targets per turbine by facility in the analysis. If the offset is log(days in season), lambda is number of targets per day for each cell. If two offset terms are included like offset(log(turbines)) + offset(log(days)), lambda is number of targets per turbine per day for the cell.

Parameter Mtot is the sum of all M parameters in the analysis. This derived parameters may not mean anything in specific problems, but is a fairly common derived parameter.

The EoAR model implemented here is :

  1. Target count Y[i] is assumed to be binomial(M[i],g[i]).

  2. Binomial index M[i] is assumed to be poisson(lambda[i]).

  3. Binomial probability g[i] is assumed to be Beta(alpha[i],beta[i]).

  4. Poisson rate lambda[i] is constrained to be a log-linear function of covariaes. That is, log(lambda[i]) = A[0] + A[1]x[1i] + A[2]x[2i] + etc. If an log(offset) term is included, the model is log(lambda[i]/offset[i]) = A[0] + A[1]x[1i] + A[2]x[2i] + etc.

  5. Beta hyper-parameters alpha[i] and beta[i] are constants.

There are two ways to obtain the exact same results across multiple runs. One method is to specify seeds here. This will set the MCMC seeds in JAGS so that exact chains are reproduced. For example, if run1 is the result of a previous call, eoar(...,seeds=run1$seeds) will replicate run1 exactly. The second method is to use R's default set.seed just before calling this routine.

Value

An object of class "eoar". EoAR objects are lists containing the following components:

Author(s)

Trent McDonald

See Also

labels.eoar, coef.eoar, predict.eoar, model.matrix.eoar

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# A 3 year study of 7 sites. 21 "cells". lambda change = 20/year
set.seed(9430834) # fixes Y and g of this example, but not the RNG's used in chains
ns <- 3

ny <- 7
g <- data.frame(
 alpha = rnorm(ns*ny,70,2),
 beta = rnorm(ns*ny,700,25)
)
Y <- rbinom(ns*ny, c(rep(20,ny), rep(40,ny), rep(60,ny)), g$alpha/(g$alpha+g$beta))

df <- data.frame(year=factor(c(rep("2015",ny),rep("2016",ny),rep("2017",ny))),
   Year=c(rep(1,ny),rep(2,ny),rep(3,ny)))

# Uninformed eoar (use low number of iterations because it's and example)
eoar.1 <- eoar(Y~year, g, df, nburn = 1000, niters= 50*10, nthins = 10 )

# Repeat and get exact same answers
eoar.1 <- eoar(Y~year, g, df, nburn = 1000, niters= 50*10, nthins = 10, seeds=eoar.1$seeds )

# Run Informed EoAR.  Assume prior annual lambda estimates
# are 10, 15, and 20, all with sd=.5.
prior <- data.frame(mean=c(log(10),log(15/10),log(20/10)),
  sd=c(0.5,0.5,0.5))
row.names(prior) <- c("(Intercept)","year2016","year2017")

ieoar <- eoar(Y~year, g, df, priors=prior, nburn = 1000, niters= 50*10, nthins = 10 )

# The above chains do not converge due to the low number of iterations used here.
# To check convergence and autocorrelation

gelman.diag(ieoar$out) # gelmanStats
gelman.plot(ieoar$out) # gelmanPlot

# Nice traceplots
library(lattice)
plot(ieoar$out) # tracePlot, all parameters
xyplot(ieoar$out[,c("(Intercept)","year2016","year2017")]) # nicer trace of coefficients

# Autocorrelation functions
acfplot(ieoar$out[,c("(Intercept)","year2016","year2017")], ylim=c(-.2,1), lag.max=300)
acfplot(ieoar$out[,c("(Intercept)","year2016","year2017")], ylim=c(-.2,1), thin=2)

# Density plots
densityplot(ieoar$out[,c("(Intercept)","year2016","year2017")])
densityplot(ieoar$out[,c("lambda[1]","lambda[8]","lambda[15]")]) # Mean lambda each year

# Correlation among coefficients
levelplot(ieoar$out[,c("(Intercept)","year2016","year2017")][[1]])

# QQ plots
qqmath(ieoar$out[,c("(Intercept)","year2016","year2017")])

tmcd82070/EoAR documentation built on July 13, 2021, 5:52 p.m.