Description Usage Arguments Details Value Parameter Meanings References See Also Examples
This function is used to model disease (freedom and prevalence estimation) using a Bayesian hierarchical model one, two, or three levels of sampling. The most general, the three level, would be represented by the hierarchy: (region -> subzone -> cluster -> subject) and the two- and one- level sampling would simply be the same with either subzone removed or subzone and cluster removed, respectively. This function is the 'storage' model and hence stores all, even extraneous, output.
1 2 3 4 5 6 | EpiBayes_s(H, k, n, seasons, reps, MCMCreps, poi = "tau", y = NULL,
mumodes = matrix(c(0.5, 0.7, 0.5, 0.7, 0.02, 0.5, 0.02, 0.5), 4, 2, byrow =
TRUE), pi.thresh = 0.02, tau.thresh = 0.05, gam.thresh = 0.1,
tau.T = 0, poi.lb = 0, poi.ub = 1, p1 = 0.95, psi = 4,
omegaparm = c(100, 1), gamparm = c(100, 1), tauparm = c(1, 1),
etaparm = c(100, 6), thetaparm = c(100, 6), burnin = 1000)
|
H |
Number of subzones/states. Should be a 1 if there are only one or two levels of sampling. Integer scalar. |
k |
Number of clusters / farms / ponds / herds. Should be a 1 if there is only one
level of sampling. Integer vector ( |
n |
Number of subjects / animals / mussels / pigs per cluster (can differ among
clusters). Integer vector ( |
seasons |
Numeric season for each cluster in the order: Summer (1), Fall (2),
Winter (3), Spring (4). Integer vector ( |
reps |
Number of (simulated) replicated data sets. Integer scalar. |
MCMCreps |
Number of iterations in the MCMC chain per replicated data set. Integer scalar. |
poi |
The p(arameter) o(f) i(nterest) specifies one of the subzone-level prevalence
( |
y |
An optional input of sums of positive diagnostic testing results if one has a
specific set of diagnostic testing outcomes for every subject (will simulate these
if this is left as |
mumodes |
Modes and (a) 95th percentiles for mode ≤q 0.50 or (b) 5th percentiles for mode > 0.5 for season-specific mean prevalences for diseased clusters in the order: Summer, Fall, Winter, Spring. Real matrix (4 x 2). |
pi.thresh |
Threshold that we must show subject-level prevalence is below to declare disease freedom. Real scalar. |
tau.thresh |
Threshold that we must show cluster-level prevalence is below to declare disease freedom. Real scalar. |
gam.thresh |
Threshold that we must show subzone-level prevalence is below to declare disease freedom. Real scalar. |
tau.T |
Assumed true cluster-level prevalence (used to simulate data to feed into the Bayesian model). Real scalar. |
poi.lb,poi.ub |
Lower and upper bounds for posterior |
p1 |
Probability we must show |
psi |
(Inversely related to) the variability of the subject-level prevalences in diseased clusters. Real scalar. |
omegaparm |
Prior parameters for the beta-distributed input |
gamparm |
Prior parameters for the beta-distributed input |
tauparm |
Prior parameters for the beta-distributed input |
etaparm |
Prior parameters for the beta-distributed input |
thetaparm |
Prior parameters for the beta-distributed input |
burnin |
Number of MCMC iterations to discard from the beginning of the chain. Integer scalar. |
Note that this function performs in the same manner as EpiBayes_ns
except
it stores the posterior distributions of all of the parameters in the model and
hence takes a bit longer to run.
This model is a Bayesian hierarchical model that serves two main purposes:
Simulation model: can simulate data under user-specified conditions and run replicated data sets under the Bayesian model to observed the behavior of the system under random realizations of simulated data.
Posterior inference model: can use actual observed data from the field, run it through the Bayesian model, and make inference on parameter(s) of interest using the posterior distribution(s).
The posterior distributions are avaialable for a particular parameter, say
tau
, by typing name_of_your_model$taumat
.
Note: be careful about the size of the taumat
matrix you are calling. The
last index of any of the variables from above is the MCMC replications and so we
would typically always omit the last index when looking at any particular variable.
If we want to look at the posterior distribution of the cluster level
prevalence (tau
) for the first replication, we will note that taumat is
a matrix with rows indexed by replication and columns by MCMC replications. Then,
we will type something like name_of_your_model$taumat[1, ]
to visually
inspect the posterior distribution in the form of a vector. For the second
replication, we can type name_of_your_model$taumat[2, ]
, and so forth.
Then, we can make histograms of these distributions if we so desire by the
following code:
hist(name_of_your_model$taumat[1, ], col = "cyan");box("plot")
.
To observe a trace plot, we can type:
plot(name_of_your_model$taumat[1, ], type = "l")
for all of the MCMC replications and we can look
at the trace plot after a burnin of 1000 iterations by typing:
plot(name_of_your_model$taumat[1, -c(1:1000)], type ="l")
.
If we want to look at the posterior distribution for the subject-level
prevalence (pi
) for the tenth replication in the third cluster, we would
type name_of_your_model$pimat[10, 1, ]
since the matrix containing the
posterior distributions for the subject-level prevalences are indexed by
replications in the first dimension, clusters in the second, and MCMC
replications in the third. We can make histograms and trace plots using the same
code as from above.
This function is the 'storage' model meaning it returns the posterior distributions of
all of the model parameters. It returns enough to perform inference and
to perform diagnostic testing for the model fit nor MCMC convergence.
If this is the first time running the model, we recommend that the user utilizes
this function, EpiBayes_s
, and diagnose issues before continuing with
the 'no storage' model, EpiBayes_ns
, for faster results, especially if
it is being used as a simulation-type model and not a poserior inference-type model.
Nevertheless, below are the outputs of this model.
Output | Attributes | Description |
p2.tilde | Real scalar | Proportion of simulated data sets that result in the probability of poi prevalence below poi.thresh with probability p1 |
p4.tilde | Real scalar | Proportion of simulated data sets that result in the probability of poi prevalence above poi.thresh with probability p1 |
p6.tilde | Real scalar | Proportion of simulated data sets that result in the probability of poi prevalence between poi.lb and poi.ub with probability p1 |
taumat | Real array (reps x H x MCMCreps ) | Posterior distributions of the cluster-level prevalence for all simulated data sets (i.e., reps ) |
gammat | Real matrix (reps x MCMCreps ) | Posterior distribution of the subzone-level prevalence |
omegamat | Real matrix (reps x MCMCreps ) | Posterior distribution of the probability of the disease being in the region |
z.gammat | Real matrix (reps x MCMCreps ) | Posterior distribution for the latent indicator denoting whether the disease is present among the subzones |
z.taumat | Real array (reps x H x MCMCreps ) | Posterior distribution for the latent indicator denoting whether the disease is present among the clusters |
pimat | Real array (reps x sum(k) x MCMCreps ) | Posterior distribution for the subject-level (or within-cluster) prevalence |
z.pimat | Real array (reps x sum(k) x MCMCreps ) | Posterior distribution for the latent indicators denoting whether the disease is present within any given cluster |
mumat | Real matrix (reps x 4 x MCMCreps ) | Posterior distribution for the mean prevalence within diseased clusters |
psi | Real scalar | User-input psi value |
etamat | Real matrix (reps x MCMCreps ) | Posterior distribution for the sensitivity of the diagnostic test |
thetamat | Real matrix (reps x MCMCreps ) | Posterior distribution for the specificity of the diagnostic test |
c1mat | Real array (reps x sum(k) x MCMCreps ) | Posterior disribution for the latent indicators denoting true positive results from the diagnostic test |
c2mat | Real array (reps x sum(k) x MCMCreps ) | Posterior disribution for the latent indicators denoting true negative results from the diagnostic test |
mumh.tracker | Real matrix (reps x 4) | Vector of proportions of accepted Metropolis-Hastings proposals for the simulation from the posterior of input mu |
y | Integer matrix (reps x sum(k) ) | Matrix of observed counts of diseased subjects per cluster per simulated data set (or the actual observed counts input by the user) |
ForOthers | Various other data not intended to be used by the user, but used to pass information on to the plot , summary , and print methods |
|
The posterior distribution output, say taumat
, can be manipulated after it is
returned with the coda package after it is converted to an mcmc
object.
Below we describe the meanings of the parameters in epidemiological language for ease of implementation and elicitation of inputs to this function.
Parameter | (3-level) Description | (2-level) Description |
omega | Probability of disease being in the region. | Not used. |
gam | Subzone-level (between-subzone) prevalence. | Probability of disease being in the region. |
z.gam | Subzone-level (between-subzone) prevalence latent indicator variable. | Not used. |
tau | Cluster-level (between-cluster) prevalence. | Same as (3-level). |
z.tau | Cluster-level (between-cluster) prevalence latent indicator variable. | Same as (3-level). |
pi | Subject-level (within-cluster) prevalence. | Same as (3-level). |
z.pi | Subject-level (within-cluster) prevalence latent indicator variable. | Same as (3-level). |
mu | Mean prevalence among infected clusters. | Same as (3-level). |
psi | (Related to) variability of prevalence among infected clusters (inversely related so higher psi -> lower variance of prevalences among diseased clusters). | Same as (3-level). |
eta | Diagnostic test sensitivity. | Same as (3-level). |
theta | Diagnostic test specificity. | Same as (3-level). |
c1 | Latent count of true positive diagnostic test results. | Same as (3-level). |
c2 | Latent count of true negative diagnostic test results. | Same as (3-level). |
Note that in the code, each of these variables are denoted, for example,
taumat
instead of plain tau
to denote that that variable is a
matrix (or, more generally an array) of values.
Branscum, A., Johnson, W., and Gardner, I. (2006) Sample size calculations for disease freedom and prevalence estimation surveys. Statistics in Medicine 25, 2658-2674.
The function EpiBayes_ns
stores less output so the user may run the model
more quickly, while losing the ability to diagnose any model fit or convergence
issues.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | testrun_storage = EpiBayes_s(
H = 2,
k = rep(30, 2),
n = rep(rep(150, 30), 2),
seasons = rep(c(1, 2, 3, 4), each = 15),
reps = 10,
MCMCreps = 10,
poi = "tau",
y = NULL,
mumodes = matrix(c(
0.50, 0.70,
0.50, 0.70,
0.02, 0.50,
0.02, 0.50
), 4, 2, byrow = TRUE
),
pi.thresh = 0.05,
tau.thresh = 0.02,
gam.thresh = 0.10,
tau.T = 0,
poi.lb = 0,
poi.ub = 1,
p1 = 0.95,
psi = 4,
omegaparm = c(100, 1),
gamparm = c(100, 1),
tauparm = c(1, 1),
etaparm = c(100, 6),
thetaparm = c(100, 6),
burnin = 1
)
testrun_storage
print(testrun_storage)
testrun_storagesummary = summary(testrun_storage, prob = 0.90, n.output = 5)
testrun_storagesummary
plot(testrun_storage$taumat[1, 1, ], type = "l")
plot(testrun_storage$gammat[1, ], type = "l")
## Can look at all posterior distributions
plot(testrun_storage$pimat[1, 1, ], type = "l")
plot(testrun_storage$omegamat[1, ], type = "l")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.