Description Usage Arguments Format Value Description HaldDP Object Methods Details Author(s) References Examples
Builds a HaldDP source attribution model
1 |
y |
a | ||||||||||||||||||||||||||||||||||
x |
an | ||||||||||||||||||||||||||||||||||
k |
a | ||||||||||||||||||||||||||||||||||
priors |
| ||||||||||||||||||||||||||||||||||
a_q |
the Dirichlet Process concentration parameter. | ||||||||||||||||||||||||||||||||||
inits |
initial values for the mcmc algorithm. This is an optional list
that may contain any of the following items:
|
Object of R6Class
with methods for creating a HaldDP model,
running the model, and accessing and plotting the results.
Object of HaldDP
with methods for creating a HaldDP model,
running the model, and accessing and plotting the results.
This function fits a non-parametric Poisson source attribution model for human cases of disease. It supports multiple types, sources, times and locations. The number of human cases for each type, time and location follow a Poisson likelihood.
mcmc_params(n_iter = 1000, burn_in = 0, thin = 1,
n_r = ceiling(private$nTypes * 0.2), update_schema = c('q','alpha','r'))
when called, sets the mcmc parameters.
n_iter
sets the number of iterations returned (after removing
burn_in
and thinning results by thin
i.e. a total of
(n_iter * thin) + burn_in iterations are run)
n_r
is a positive
integer that sets the total number of r_{ijtl}
parameters to be updated
at each time-location-source combination (the default is 20 percent updated
per iteration)
update_schema
a character vector containing the parameters to update
(any of 'q
','alpha
','r
').
update(n_iter, append = TRUE)
when called, updates the HaldDP
model by running n_iter
iterations.
If missing n_iter
, the n_iter
last set using mcmc_params()
or update()
is used.
append
is a logical value which determines whether the next n_iter
iterations
are appended to any previous iterations, or overwrites them. When
append = TRUE
, the starting values are the last iteration and no
burn_in
is removed. Running the model for the first time, or changing any
model or fitting parameters will set append = FALSE
.
get_data
returns a list containing the human data y
(an array y[types, times, locations]), the source data X
(an array X[types, sources, times]),
the prevalence data (an array k[sources, times]), the type names, source names,
time names, location names and number of different types, sources, times and locations.
get_priors
returns a list containing the DP concentration
parameter a_q
, and the priors (R6 class with members named a_alpha
(members are array a_alpha[sources, times, locations]
), a_r
(an array
a_r[types, sources, times]
), a_theta
and b_theta
).
get_inits
returns an R6 class holding the initial values
(members are alpha
(an array alpha[sources, times, locations]
),
theta
(an array theta[types, iters]
), s
(an array
s[types, iters]
), and r
(an array r[types, sources, times]
)).
get_mcmc_params
returns a list of fitting parameters (n_iter
,
append
, burn_in
, thin
, update_schema
(R6 class with members
alpha
, q
, r
)).
get_acceptance
returns an R6 class containing the acceptance
rates for each parameter (members are alpha
(an array alpha[sources, times, locations]
),
and r
(an array r[types, sources, times]
)).
extract(params = c("alpha", "q", "s", "r", "lambda_i", "xi", "xi_prop"),
times = NULL, locations = NULL, sources = NULL, types = NULL, iters = NULL,
flatten = FALSE, drop = TRUE)
returns a list contining a subset of the parameters
(determined by the params
vector, times
, locations
, sources
, types
and iters
).
If flatten
is set to TRUE
, it returns a dataframe with 1 column per
parameter, otherwise it returns a list containing params
containing a
subset of the following arrays: alpha[Sources, Times, Locations, iters]
, q[Types, iters]
,
s[Types, iters]
, r[Types, Sources, Times, iters]
,
lambda_i[Types, Times, Locations, iters]
,
xi[Sources, Times, Locations, iters]
.
drop
determines whether to delete the dimensions of an array which have only one
level when flatten = FALSE
.
summary(alpha = 0.05, params = c("alpha", "q", "s", "r", "lambda_i",
"xi" ,"xi_prop"), times = NULL, locations = NULL, sources = NULL,
types = NULL, iters = NULL, flatten = FALSE, drop = TRUE, CI_type = "chen-shao")
returns a list contining the
median and credible intervals for a subset of the parameters. The default credible
interval type are Chen-Shao ("chen-shao"
) highest posterior density intervals (alternatives
are "percentiles"
and "spin"
).
See extract
for details on the subsetting. xi_prop
returns the
proportion of cases attributed to each source j
and is calculated by dividing
each iteration of lambda_{jtl}
values by their sum within each time t
and location l
.
plot_heatmap(iters, cols = c("blue","white"), hclust_method = "complete")
Creates a dendrogram and heatmap for the type effect groupings (s
parameter
in the model). This uses the heatmap.2 function from gplots.
iters
is a vector containing the iterations to be used in constructing
the graph. Default is all iterations in posterior.
hclust_method
allows the user to select the method used by stats::hclust
to
cluster the type effect groupings s
.
cols
gives the colours for completely dissimilar (dissimilarity value
of 1), and identical (dissimilarity value of 0). All other values will be in
between the two chosen colours. See ?colorRampPalette for more details..
This function fits a source attribution model for human cases of disease. It supports multiple types, sources, times and locations. The number of human cases for each type, time and location follows a Poisson or Negative Binomial likelihood. Model
y_{itl}\sim\textsf{Poisson}(λ_{itl})
where
λ_{itl}=∑_{j=1}^{m}λ_{ijtl}=q_{k(i)}∑_{j=1}^{m}(r_{ijt}\cdot k_{j}\cdot alpha_{jtl})
The parameters are defined as follows:
a_{jtl}
is the unknown source effect for source j, time t, location l
q_{s(i)}
is the unknown type effect for type i in group s.
x_{ij}
is the known number of positive samples for each source j typei combination
n_{ij}
is the known total number of samples for each source j type i combination
k_{j}
is the fixed prevalence in source (i.e. the number of positive samples divided by the number of negative samples) j
r_{ijt}
is the unknown relative occurrence of type i on source j.
Priors
r_{.jt}\sim Dirichlet(a\_r_{1jt},..., a\_r_{njt})
a_{tl}\sim Dirichlet(a\_alpha_{1tl},..., a\_alpha_{mtl})
q\sim DP(a_q, Gamma(a_{theta},b_{theta}))
Chris Jewell and Poppy Miller p.miller at lancaster.ac.uk
Chen, M.-H. and Shao, Q.-M. (1998). Monte Carlo estimation of Bayesian credible and HPD intervals, Journal of Computational and Graphical Statistics, 7.
Liu Y, Gelman A, Zheng T (2015). Simulation-efficient shortest probability intervals, Statistics and Computing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | #### Format data using Y, X, and Prev functions #############################
## Input data must be in long format
y <- Y( # Cases
data = sim_SA$cases,
y = "Human",
type = "Type",
time = "Time",
location = "Location"
)
x <- X( # Sources
data = sim_SA$sources,
x = "Count",
type = "Type",
time = "Time",
source = "Source"
)
k <- Prev( # Prevalences
data = sim_SA$prev,
prev = "Value",
time = "Time",
source = "Source"
)
#### Create Dirichlet(1) priors #############################################
## Create alpha prior data frame
prior_alpha_long <- expand.grid(
Source = unique(sim_SA$sources$Source),
Time = unique(sim_SA$sources$Time),
Location = unique(sim_SA$cases$Location),
Alpha = 1
)
# Use the Alpha() constructor to specify alpha prior
prior_alpha <- Alpha(
data = prior_alpha_long,
alpha = 'Alpha',
source = 'Source',
time = 'Time',
location = 'Location'
)
## Create r prior data frame
prior_r_long <- expand.grid(
Type = unique(sim_SA$sources$Type),
Source = unique(sim_SA$sources$Source),
Time = unique(sim_SA$sources$Time),
Value = 0.1
)
# Use X() constructor to specify r prior
prior_r <- X(
data = prior_r_long,
x = 'Value',
type = 'Type',
time = 'Time',
source = 'Source'
)
## Pack all priors into a list
priors <- list(
a_theta = 0.01,
b_theta = 0.00001,
a_alpha = prior_alpha,
a_r = prior_r
)
## If all prior values are the same, they can be specified in shorthand
## Equivalent result to the longform priors specified above
priors <- list(
a_theta = 0.01,
b_theta = 0.00001,
a_alpha = 1,
a_r = 0.1
)
#### Set initial values (optional) ##########################################
types <- unique(sim_SA$cases$Type)
q_long <- data.frame(q=rep(15, length(types)), Type=types)
init_q <- Q(q_long, q = 'q', type = 'Type')
inits <- list(q = init_q) # Pack starting values into a list
#### Construct model ########################################################
my_model <- HaldDP(y = y, x = x, k = k, priors = priors, inits = inits, a_q = 0.1)
#### Set mcmc parameters ####################################################
my_model$mcmc_params(n_iter = 2, burn_in = 2, thin = 1)
#### Update model ###########################################################
my_model$update()
## Add an additional 10 iterations
my_model$update(n_iter = 2, append = TRUE)
#### Extract posterior ######################################################
## returns the posterior for the r, alpha, q, c,
## lambda_i, xi and xi_prop parameters,
## for all times, locations, sources and types
## the posterior is returned as a list or arrays
## Not run: str(my_model$extract())
## returns the posterior for the r and alpha parameters,
## for time 1, location B, sources Source3, and Source4,
## types 5, 25, and 50, and iterations 200:300
## the posterior is returned as a list of dataframes
## Not run:
str(my_model$extract(params = c("r", "alpha"),
times = "1", location = "B",
sources = c("Source3", "Source4"),
types = c("5", "25", "50"),
iters = 5:15,
flatten = TRUE))
## End(Not run)
#### Calculate medians and credible intervals ###############################
## Not run: my_model$summary(alpha = 0.05, CI_type = "chen-shao")
## subsetting is done in the same way as extract()
## Not run: my_model$summary(alpha = 0.05, CI_type = "chen-shao",
params = c("r", "alpha"),
times = "1", location = "B",
sources = c("Source3", "Source4"),
types = c("5", "25", "50"),
iters = 5:15,
flatten = TRUE)
## End(Not run)
#### Plot heatmap and dendrogram of the type effect grouping ################
my_model$plot_heatmap()
#### Extract data, initial values, prior values, acceptance
## rates for the mcmc algorithm and mcmc parameters
my_model$get_data()
my_model$get_inits()
my_model$get_priors()
my_model$get_acceptance()
my_model$get_mcmc_params()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.