EpiBayes_s: Disease Model with Storage
In EpiBayes: Implements Hierarchical Bayesian Models for Epidemiological Applications

Description Usage Arguments Details Value Parameter Meanings References See Also Examples

This function is used to model disease (freedom and prevalence estimation) using a Bayesian hierarchical model one, two, or three levels of sampling. The most general, the three level, would be represented by the hierarchy: (region -> subzone -> cluster -> subject) and the two- and one- level sampling would simply be the same with either subzone removed or subzone and cluster removed, respectively. This function is the 'storage' model and hence stores all, even extraneous, output.

EpiBayes_s(H, k, n, seasons, reps, MCMCreps, poi = "tau", y = NULL,
  mumodes = matrix(c(0.5, 0.7, 0.5, 0.7, 0.02, 0.5, 0.02, 0.5), 4, 2, byrow =
  TRUE), pi.thresh = 0.02, tau.thresh = 0.05, gam.thresh = 0.1,
  tau.T = 0, poi.lb = 0, poi.ub = 1, p1 = 0.95, psi = 4,
  omegaparm = c(100, 1), gamparm = c(100, 1), tauparm = c(1, 1),
  etaparm = c(100, 6), thetaparm = c(100, 6), burnin = 1000)

`H`	Number of subzones/states. Should be a 1 if there are only one or two levels of sampling. Integer scalar.
`k`	Number of clusters / farms / ponds / herds. Should be a 1 if there is only one level of sampling. Integer vector (`H` x 1).
`n`	Number of subjects / animals / mussels / pigs per cluster (can differ among clusters). Integer vector (`sum(k)` x 1).
`seasons`	Numeric season for each cluster in the order: Summer (1), Fall (2), Winter (3), Spring (4). Integer vector (`sum(k)` x 1).
`reps`	Number of (simulated) replicated data sets. Integer scalar.
`MCMCreps`	Number of iterations in the MCMC chain per replicated data set. Integer scalar.
`poi`	The p(arameter) o(f) i(nterest) specifies one of the subzone-level prevalence (`gam`) or the cluster-level prevalence (`tau`), indicating which variable with which to compute the simulation output `p2.tilde`, `p4.tilde`, and `p6.tilde`. Character scalar.
`y`	An optional input of sums of positive diagnostic testing results if one has a specific set of diagnostic testing outcomes for every subject (will simulate these if this is left as `NULL`). Integer matrix (`reps` x `sum(k)`).
`mumodes`	Modes and (a) 95th percentiles for mode ≤q 0.50 or (b) 5th percentiles for mode > 0.5 for season-specific mean prevalences for diseased clusters in the order: Summer, Fall, Winter, Spring. Real matrix (4 x 2).
`pi.thresh`	Threshold that we must show subject-level prevalence is below to declare disease freedom. Real scalar.
`tau.thresh`	Threshold that we must show cluster-level prevalence is below to declare disease freedom. Real scalar.
`gam.thresh`	Threshold that we must show subzone-level prevalence is below to declare disease freedom. Real scalar.
`tau.T`	Assumed true cluster-level prevalence (used to simulate data to feed into the Bayesian model). Real scalar.
`poi.lb,poi.ub`	Lower and upper bounds for posterior `poi` prevalences to show ability to capture `poi` with certain probability. Real scalars.
`p1`	Probability we must show `poi` prevalence is below / above the thresholds `pi.thresh` or `tau.thresh` or within specified bounds. Real scalar.
`psi`	(Inversely related to) the variability of the subject-level prevalences in diseased clusters. Real scalar.
`omegaparm`	Prior parameters for the beta-distributed input `omega`, which is the probability that the disease is in the region. Real vector (2 x 1).
`gamparm`	Prior parameters for the beta-distributed input `gam`, which is the subzone-level prevalence (or the prevalence among subzones). Real vector (2 x 1).
`tauparm`	Prior parameters for the beta-distributed input `tau`, which is the cluster-level prevalence (or the prevalence among clusters). Real vector (2 x 1).
`etaparm`	Prior parameters for the beta-distributed input `eta`, which is the sensitivity of the diagnostic test. Real vector (2 x 1).
`thetaparm`	Prior parameters for the beta-distributed input `theta`, which is the specificity of the diagnostic test. Real vector (2 x 1).
`burnin`	Number of MCMC iterations to discard from the beginning of the chain. Integer scalar.

Note that this function performs in the same manner as EpiBayes_ns except it stores the posterior distributions of all of the parameters in the model and hence takes a bit longer to run. This model is a Bayesian hierarchical model that serves two main purposes:

Simulation model: can simulate data under user-specified conditions and run replicated data sets under the Bayesian model to observed the behavior of the system under random realizations of simulated data.
Posterior inference model: can use actual observed data from the field, run it through the Bayesian model, and make inference on parameter(s) of interest using the posterior distribution(s).

The posterior distributions are avaialable for a particular parameter, say tau, by typing name_of_your_model$taumat. Note: be careful about the size of the taumat matrix you are calling. The last index of any of the variables from above is the MCMC replications and so we would typically always omit the last index when looking at any particular variable.

If we want to look at the posterior distribution of the cluster level prevalence (tau) for the first replication, we will note that taumat is a matrix with rows indexed by replication and columns by MCMC replications. Then, we will type something like name_of_your_model$taumat[1, ] to visually inspect the posterior distribution in the form of a vector. For the second replication, we can type name_of_your_model$taumat[2, ], and so forth. Then, we can make histograms of these distributions if we so desire by the following code: hist(name_of_your_model$taumat[1, ], col = "cyan");box("plot"). To observe a trace plot, we can type: plot(name_of_your_model$taumat[1, ], type = "l") for all of the MCMC replications and we can look at the trace plot after a burnin of 1000 iterations by typing: plot(name_of_your_model$taumat[1, -c(1:1000)], type ="l").
If we want to look at the posterior distribution for the subject-level prevalence (pi) for the tenth replication in the third cluster, we would type name_of_your_model$pimat[10, 1, ] since the matrix containing the posterior distributions for the subject-level prevalences are indexed by replications in the first dimension, clusters in the second, and MCMC replications in the third. We can make histograms and trace plots using the same code as from above.

This function is the 'storage' model meaning it returns the posterior distributions of all of the model parameters. It returns enough to perform inference and to perform diagnostic testing for the model fit nor MCMC convergence. If this is the first time running the model, we recommend that the user utilizes this function, EpiBayes_s, and diagnose issues before continuing with the 'no storage' model, EpiBayes_ns, for faster results, especially if it is being used as a simulation-type model and not a poserior inference-type model. Nevertheless, below are the outputs of this model.

Output	Attributes	Description
`p2.tilde`	Real scalar	Proportion of simulated data sets that result in the probability of `poi` prevalence below `poi.thresh` with probability `p1`
`p4.tilde`	Real scalar	Proportion of simulated data sets that result in the probability of `poi` prevalence above `poi.thresh` with probability `p1`
`p6.tilde`	Real scalar	Proportion of simulated data sets that result in the probability of `poi` prevalence between `poi.lb` and `poi.ub` with probability `p1`
`taumat`	Real array (`reps` x `H` x `MCMCreps`)	Posterior distributions of the cluster-level prevalence for all simulated data sets (i.e., `reps`)
`gammat`	Real matrix (`reps` x `MCMCreps`)	Posterior distribution of the subzone-level prevalence
`omegamat`	Real matrix (`reps` x `MCMCreps`)	Posterior distribution of the probability of the disease being in the region
`z.gammat`	Real matrix (`reps` x `MCMCreps`)	Posterior distribution for the latent indicator denoting whether the disease is present among the subzones
`z.taumat`	Real array (`reps` x `H` x `MCMCreps`)	Posterior distribution for the latent indicator denoting whether the disease is present among the clusters
`pimat`	Real array (`reps` x `sum(k)` x `MCMCreps`)	Posterior distribution for the subject-level (or within-cluster) prevalence
`z.pimat`	Real array (`reps` x `sum(k)` x `MCMCreps`)	Posterior distribution for the latent indicators denoting whether the disease is present within any given cluster
`mumat`	Real matrix (`reps` x 4 x `MCMCreps`)	Posterior distribution for the mean prevalence within diseased clusters
`psi`	Real scalar	User-input `psi` value
`etamat`	Real matrix (`reps` x `MCMCreps`)	Posterior distribution for the sensitivity of the diagnostic test
`thetamat`	Real matrix (`reps` x `MCMCreps`)	Posterior distribution for the specificity of the diagnostic test
`c1mat`	Real array (`reps` x `sum(k)` x `MCMCreps`)	Posterior disribution for the latent indicators denoting true positive results from the diagnostic test
`c2mat`	Real array (`reps` x `sum(k)` x `MCMCreps`)	Posterior disribution for the latent indicators denoting true negative results from the diagnostic test
`mumh.tracker`	Real matrix (`reps` x 4)	Vector of proportions of accepted Metropolis-Hastings proposals for the simulation from the posterior of input `mu`
`y`	Integer matrix (`reps` x `sum(k)`)	Matrix of observed counts of diseased subjects per cluster per simulated data set (or the actual observed counts input by the user)
`ForOthers`		Various other data not intended to be used by the user, but used to pass information on to the `plot`, `summary`, and `print` methods

The posterior distribution output, say taumat, can be manipulated after it is returned with the coda package after it is converted to an mcmc object.

Below we describe the meanings of the parameters in epidemiological language for ease of implementation and elicitation of inputs to this function.

Parameter	(3-level) Description	(2-level) Description
`omega`	Probability of disease being in the region.	Not used.
`gam`	Subzone-level (between-subzone) prevalence.	Probability of disease being in the region.
`z.gam`	Subzone-level (between-subzone) prevalence latent indicator variable.	Not used.
`tau`	Cluster-level (between-cluster) prevalence.	Same as (3-level).
`z.tau`	Cluster-level (between-cluster) prevalence latent indicator variable.	Same as (3-level).
`pi`	Subject-level (within-cluster) prevalence.	Same as (3-level).
`z.pi`	Subject-level (within-cluster) prevalence latent indicator variable.	Same as (3-level).
`mu`	Mean prevalence among infected clusters.	Same as (3-level).
`psi`	(Related to) variability of prevalence among infected clusters (inversely related so higher `psi` -> lower variance of prevalences among diseased clusters).	Same as (3-level).
`eta`	Diagnostic test sensitivity.	Same as (3-level).
`theta`	Diagnostic test specificity.	Same as (3-level).
`c1`	Latent count of true positive diagnostic test results.	Same as (3-level).
`c2`	Latent count of true negative diagnostic test results.	Same as (3-level).

Note that in the code, each of these variables are denoted, for example, taumat instead of plain tau to denote that that variable is a matrix (or, more generally an array) of values.

Branscum, A., Johnson, W., and Gardner, I. (2006) Sample size calculations for disease freedom and prevalence estimation surveys. Statistics in Medicine 25, 2658-2674.

The function EpiBayes_ns stores less output so the user may run the model more quickly, while losing the ability to diagnose any model fit or convergence issues.

testrun_storage = EpiBayes_s(
		H = 2,
		k = rep(30, 2),
		n = rep(rep(150, 30), 2),
		seasons = rep(c(1, 2, 3, 4), each = 15),
		reps = 10,
		MCMCreps = 10,
		poi = "tau",
		y = NULL,
		mumodes = matrix(c(
			0.50, 0.70,
			0.50, 0.70,
			0.02, 0.50,
			0.02, 0.50
			), 4, 2, byrow = TRUE
		),
		pi.thresh = 0.05,
	    tau.thresh = 0.02,
     gam.thresh = 0.10,
		tau.T = 0,
		poi.lb = 0,
		poi.ub = 1,
		p1 = 0.95,
		psi = 4,
		omegaparm = c(100, 1),
		gamparm = c(100, 1),
		tauparm = c(1, 1),
		etaparm = c(100, 6),
		thetaparm = c(100, 6),
		burnin = 1
		)

testrun_storage
print(testrun_storage)
testrun_storagesummary = summary(testrun_storage, prob = 0.90, n.output = 5)
testrun_storagesummary
plot(testrun_storage$taumat[1, 1, ], type = "l")
plot(testrun_storage$gammat[1, ], type = "l")

## Can look at all posterior distributions
    plot(testrun_storage$pimat[1, 1, ], type = "l")
    plot(testrun_storage$omegamat[1, ], type = "l")