EpiBayes_s: Disease Model with Storage

Description Usage Arguments Details Value Parameter Meanings References See Also Examples

View source: R/EpiBayes_s.R

Description

This function is used to model disease (freedom and prevalence estimation) using a Bayesian hierarchical model one, two, or three levels of sampling. The most general, the three level, would be represented by the hierarchy: (region -> subzone -> cluster -> subject) and the two- and one- level sampling would simply be the same with either subzone removed or subzone and cluster removed, respectively. This function is the 'storage' model and hence stores all, even extraneous, output.

Usage

1
2
3
4
5
6
EpiBayes_s(H, k, n, seasons, reps, MCMCreps, poi = "tau", y = NULL,
  mumodes = matrix(c(0.5, 0.7, 0.5, 0.7, 0.02, 0.5, 0.02, 0.5), 4, 2, byrow =
  TRUE), pi.thresh = 0.02, tau.thresh = 0.05, gam.thresh = 0.1,
  tau.T = 0, poi.lb = 0, poi.ub = 1, p1 = 0.95, psi = 4,
  omegaparm = c(100, 1), gamparm = c(100, 1), tauparm = c(1, 1),
  etaparm = c(100, 6), thetaparm = c(100, 6), burnin = 1000)

Arguments

H

Number of subzones/states. Should be a 1 if there are only one or two levels of sampling. Integer scalar.

k

Number of clusters / farms / ponds / herds. Should be a 1 if there is only one level of sampling. Integer vector (H x 1).

n

Number of subjects / animals / mussels / pigs per cluster (can differ among clusters). Integer vector (sum(k) x 1).

seasons

Numeric season for each cluster in the order: Summer (1), Fall (2), Winter (3), Spring (4). Integer vector (sum(k) x 1).

reps

Number of (simulated) replicated data sets. Integer scalar.

MCMCreps

Number of iterations in the MCMC chain per replicated data set. Integer scalar.

poi

The p(arameter) o(f) i(nterest) specifies one of the subzone-level prevalence (gam) or the cluster-level prevalence (tau), indicating which variable with which to compute the simulation output p2.tilde, p4.tilde, and p6.tilde. Character scalar.

y

An optional input of sums of positive diagnostic testing results if one has a specific set of diagnostic testing outcomes for every subject (will simulate these if this is left as NULL). Integer matrix (reps x sum(k)).

mumodes

Modes and (a) 95th percentiles for mode ≤q 0.50 or (b) 5th percentiles for mode > 0.5 for season-specific mean prevalences for diseased clusters in the order: Summer, Fall, Winter, Spring. Real matrix (4 x 2).

pi.thresh

Threshold that we must show subject-level prevalence is below to declare disease freedom. Real scalar.

tau.thresh

Threshold that we must show cluster-level prevalence is below to declare disease freedom. Real scalar.

gam.thresh

Threshold that we must show subzone-level prevalence is below to declare disease freedom. Real scalar.

tau.T

Assumed true cluster-level prevalence (used to simulate data to feed into the Bayesian model). Real scalar.

poi.lb,poi.ub

Lower and upper bounds for posterior poi prevalences to show ability to capture poi with certain probability. Real scalars.

p1

Probability we must show poi prevalence is below / above the thresholds pi.thresh or tau.thresh or within specified bounds. Real scalar.

psi

(Inversely related to) the variability of the subject-level prevalences in diseased clusters. Real scalar.

omegaparm

Prior parameters for the beta-distributed input omega, which is the probability that the disease is in the region. Real vector (2 x 1).

gamparm

Prior parameters for the beta-distributed input gam, which is the subzone-level prevalence (or the prevalence among subzones). Real vector (2 x 1).

tauparm

Prior parameters for the beta-distributed input tau, which is the cluster-level prevalence (or the prevalence among clusters). Real vector (2 x 1).

etaparm

Prior parameters for the beta-distributed input eta, which is the sensitivity of the diagnostic test. Real vector (2 x 1).

thetaparm

Prior parameters for the beta-distributed input theta, which is the specificity of the diagnostic test. Real vector (2 x 1).

burnin

Number of MCMC iterations to discard from the beginning of the chain. Integer scalar.

Details

Note that this function performs in the same manner as EpiBayes_ns except it stores the posterior distributions of all of the parameters in the model and hence takes a bit longer to run. This model is a Bayesian hierarchical model that serves two main purposes:

The posterior distributions are avaialable for a particular parameter, say tau, by typing name_of_your_model$taumat. Note: be careful about the size of the taumat matrix you are calling. The last index of any of the variables from above is the MCMC replications and so we would typically always omit the last index when looking at any particular variable.

Value

This function is the 'storage' model meaning it returns the posterior distributions of all of the model parameters. It returns enough to perform inference and to perform diagnostic testing for the model fit nor MCMC convergence. If this is the first time running the model, we recommend that the user utilizes this function, EpiBayes_s, and diagnose issues before continuing with the 'no storage' model, EpiBayes_ns, for faster results, especially if it is being used as a simulation-type model and not a poserior inference-type model. Nevertheless, below are the outputs of this model.

Output Attributes Description
p2.tilde Real scalar Proportion of simulated data sets that result in the probability of poi prevalence below poi.thresh with probability p1
p4.tilde Real scalar Proportion of simulated data sets that result in the probability of poi prevalence above poi.thresh with probability p1
p6.tilde Real scalar Proportion of simulated data sets that result in the probability of poi prevalence between poi.lb and poi.ub with probability p1
taumat Real array (reps x H x MCMCreps) Posterior distributions of the cluster-level prevalence for all simulated data sets (i.e., reps)
gammat Real matrix (reps x MCMCreps) Posterior distribution of the subzone-level prevalence
omegamat Real matrix (reps x MCMCreps) Posterior distribution of the probability of the disease being in the region
z.gammat Real matrix (reps x MCMCreps) Posterior distribution for the latent indicator denoting whether the disease is present among the subzones
z.taumat Real array (reps x H x MCMCreps) Posterior distribution for the latent indicator denoting whether the disease is present among the clusters
pimat Real array (reps x sum(k) x MCMCreps) Posterior distribution for the subject-level (or within-cluster) prevalence
z.pimat Real array (reps x sum(k) x MCMCreps) Posterior distribution for the latent indicators denoting whether the disease is present within any given cluster
mumat Real matrix (reps x 4 x MCMCreps) Posterior distribution for the mean prevalence within diseased clusters
psi Real scalar User-input psi value
etamat Real matrix (reps x MCMCreps) Posterior distribution for the sensitivity of the diagnostic test
thetamat Real matrix (reps x MCMCreps) Posterior distribution for the specificity of the diagnostic test
c1mat Real array (reps x sum(k) x MCMCreps) Posterior disribution for the latent indicators denoting true positive results from the diagnostic test
c2mat Real array (reps x sum(k) x MCMCreps) Posterior disribution for the latent indicators denoting true negative results from the diagnostic test
mumh.tracker Real matrix (reps x 4) Vector of proportions of accepted Metropolis-Hastings proposals for the simulation from the posterior of input mu
y Integer matrix (reps x sum(k)) Matrix of observed counts of diseased subjects per cluster per simulated data set (or the actual observed counts input by the user)
ForOthers Various other data not intended to be used by the user, but used to pass information on to the plot, summary, and print methods

The posterior distribution output, say taumat, can be manipulated after it is returned with the coda package after it is converted to an mcmc object.

Parameter Meanings

Below we describe the meanings of the parameters in epidemiological language for ease of implementation and elicitation of inputs to this function.

Parameter (3-level) Description (2-level) Description
omega Probability of disease being in the region. Not used.
gam Subzone-level (between-subzone) prevalence. Probability of disease being in the region.
z.gam Subzone-level (between-subzone) prevalence latent indicator variable. Not used.
tau Cluster-level (between-cluster) prevalence. Same as (3-level).
z.tau Cluster-level (between-cluster) prevalence latent indicator variable. Same as (3-level).
pi Subject-level (within-cluster) prevalence. Same as (3-level).
z.pi Subject-level (within-cluster) prevalence latent indicator variable. Same as (3-level).
mu Mean prevalence among infected clusters. Same as (3-level).
psi (Related to) variability of prevalence among infected clusters (inversely related so higher psi -> lower variance of prevalences among diseased clusters). Same as (3-level).
eta Diagnostic test sensitivity. Same as (3-level).
theta Diagnostic test specificity. Same as (3-level).
c1 Latent count of true positive diagnostic test results. Same as (3-level).
c2 Latent count of true negative diagnostic test results. Same as (3-level).

Note that in the code, each of these variables are denoted, for example, taumat instead of plain tau to denote that that variable is a matrix (or, more generally an array) of values.

References

Branscum, A., Johnson, W., and Gardner, I. (2006) Sample size calculations for disease freedom and prevalence estimation surveys. Statistics in Medicine 25, 2658-2674.

See Also

The function EpiBayes_ns stores less output so the user may run the model more quickly, while losing the ability to diagnose any model fit or convergence issues.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
testrun_storage = EpiBayes_s(
		H = 2,
		k = rep(30, 2),
		n = rep(rep(150, 30), 2),
		seasons = rep(c(1, 2, 3, 4), each = 15),
		reps = 10,
		MCMCreps = 10,
		poi = "tau",
		y = NULL,
		mumodes = matrix(c(
			0.50, 0.70,
			0.50, 0.70,
			0.02, 0.50,
			0.02, 0.50
			), 4, 2, byrow = TRUE
		),
		pi.thresh = 0.05,
	    tau.thresh = 0.02,
     gam.thresh = 0.10,
		tau.T = 0,
		poi.lb = 0,
		poi.ub = 1,
		p1 = 0.95,
		psi = 4,
		omegaparm = c(100, 1),
		gamparm = c(100, 1),
		tauparm = c(1, 1),
		etaparm = c(100, 6),
		thetaparm = c(100, 6),
		burnin = 1
		)

testrun_storage
print(testrun_storage)
testrun_storagesummary = summary(testrun_storage, prob = 0.90, n.output = 5)
testrun_storagesummary
plot(testrun_storage$taumat[1, 1, ], type = "l")
plot(testrun_storage$gammat[1, ], type = "l")

## Can look at all posterior distributions
    plot(testrun_storage$pimat[1, 1, ], type = "l")
    plot(testrun_storage$omegamat[1, ], type = "l")

EpiBayes documentation built on May 30, 2017, 4:01 a.m.