# HaldDP: Builds a HaldDP source attribution model In sourceR: Fits a Non-Parametric Bayesian Source Attribution Model

## Description

Builds a HaldDP source attribution model

## Usage

 1 HaldDP(y, x, k, priors, a_q, inits = NULL) 

## Arguments

y

a Y object containing case observations

x

an X object containing source observations

k

a Prev object containing source prevalences

priors

priors list with elements named a_r, a_alpha, a_theta and b_theta, corresponding to the prior parameters for the r, alpha, and base distribution for the DP parameters respectively.

 Parameter Prior Distribution Prior Parameters a_r Dirichlet(concentration) A single positive number or an X object containing the prior values for each source, time and type. If a single number is supplied, it will be used for all times, sources and types. a_alpha Dirichlet(concentration) A single positive number or an Alpha object containing the prior values for each source, time and location. If a single number is supplied, it will be used for all times, sources and locations. Type effects base Gamma(shape, rate) Single number for each of the shape (a_theta) and distribution parameters rate (b_theta) of the Gamma base distribution.
a_q

the Dirichlet Process concentration parameter.

inits

initial values for the mcmc algorithm. This is an optional list that may contain any of the following items: alpha,q, and r.

 Parameter Description r An object of type X giving the initial values for $R$ matrix, If not specified defaults to the element-wise maximum likelihood estimates of r from the source matrix. Source effects (alpha) An object of type Alpha specifying alpha value for each source/time/location. If not specified, default initial values for the source effects are drawn from the prior distribution. Type effects (q) An object of type Q giving the initial clustering and values for q If not specified, defaults to a single group with a theta value calculated as θ = sum(y_itl) / sum_l=1^L(sum_t=1^T(sum_i=1^n(sum_j=1^m(alpha_jtl * r_ijt * k_jt)))). i.e. theta = sum(y_itl) / sum(lambda_ijtl / theta)

## Format

Object of R6Class with methods for creating a HaldDP model, running the model, and accessing and plotting the results.

## Value

Object of HaldDP with methods for creating a HaldDP model, running the model, and accessing and plotting the results.

## Description

This function fits a non-parametric Poisson source attribution model for human cases of disease. It supports multiple types, sources, times and locations. The number of human cases for each type, time and location follow a Poisson likelihood.

mcmc_params(n_iter = 1000, burn_in = 0, thin = 1, n_r = ceiling(private$nTypes * 0.2), update_schema = c('q','alpha','r')) when called, sets the mcmc parameters. n_iter sets the number of iterations returned (after removing burn_in and thinning results by thin i.e. a total of (n_iter * thin) + burn_in iterations are run) n_r is a positive integer that sets the total number of r_{ijtl} parameters to be updated at each time-location-source combination (the default is 20 percent updated per iteration) update_schema a character vector containing the parameters to update (any of 'q','alpha','r'). update(n_iter, append = TRUE) when called, updates the HaldDP model by running n_iter iterations. If missing n_iter, the n_iter last set using mcmc_params() or update() is used. append is a logical value which determines whether the next n_iter iterations are appended to any previous iterations, or overwrites them. When append = TRUE, the starting values are the last iteration and no burn_in is removed. Running the model for the first time, or changing any model or fitting parameters will set append = FALSE. get_data returns a list containing the human data y (an array y[types, times, locations]), the source data X (an array X[types, sources, times]), the prevalence data (an array k[sources, times]), the type names, source names, time names, location names and number of different types, sources, times and locations. get_priors returns a list containing the DP concentration parameter a_q, and the priors (R6 class with members named a_alpha (members are array a_alpha[sources, times, locations]), a_r (an array a_r[types, sources, times]), a_theta and b_theta). get_inits returns an R6 class holding the initial values (members are alpha (an array alpha[sources, times, locations]), theta (an array theta[types, iters]), s (an array s[types, iters]), and r (an array r[types, sources, times])). get_mcmc_params returns a list of fitting parameters (n_iter, append, burn_in, thin, update_schema (R6 class with members alpha, q, r)). get_acceptance returns an R6 class containing the acceptance rates for each parameter (members are alpha (an array alpha[sources, times, locations]), and r (an array r[types, sources, times])). extract(params = c("alpha", "q", "s", "r", "lambda_i", "xi", "xi_prop"), times = NULL, locations = NULL, sources = NULL, types = NULL, iters = NULL, flatten = FALSE, drop = TRUE) returns a list contining a subset of the parameters (determined by the params vector, times, locations, sources, types and iters). If flatten is set to TRUE, it returns a dataframe with 1 column per parameter, otherwise it returns a list containing params containing a subset of the following arrays: alpha[Sources, Times, Locations, iters], q[Types, iters], s[Types, iters], r[Types, Sources, Times, iters], lambda_i[Types, Times, Locations, iters], xi[Sources, Times, Locations, iters]. drop determines whether to delete the dimensions of an array which have only one level when flatten = FALSE. summary(alpha = 0.05, params = c("alpha", "q", "s", "r", "lambda_i", "xi" ,"xi_prop"), times = NULL, locations = NULL, sources = NULL, types = NULL, iters = NULL, flatten = FALSE, drop = TRUE, CI_type = "chen-shao") returns a list contining the median and credible intervals for a subset of the parameters. The default credible interval type are Chen-Shao ("chen-shao") highest posterior density intervals (alternatives are "percentiles" and "spin"). See extract for details on the subsetting. xi_prop returns the proportion of cases attributed to each source j and is calculated by dividing each iteration of lambda_{jtl} values by their sum within each time t and location l. plot_heatmap(iters, cols = c("blue","white"), hclust_method = "complete") Creates a dendrogram and heatmap for the type effect groupings (s parameter in the model). This uses the heatmap.2 function from gplots. iters is a vector containing the iterations to be used in constructing the graph. Default is all iterations in posterior. hclust_method allows the user to select the method used by stats::hclust to cluster the type effect groupings s. cols gives the colours for completely dissimilar (dissimilarity value of 1), and identical (dissimilarity value of 0). All other values will be in between the two chosen colours. See ?colorRampPalette for more details.. ## Details This function fits a source attribution model for human cases of disease. It supports multiple types, sources, times and locations. The number of human cases for each type, time and location follows a Poisson or Negative Binomial likelihood. Model y_{itl}\sim\textsf{Poisson}(λ_{itl}) where λ_{itl}=∑_{j=1}^{m}λ_{ijtl}=q_{k(i)}∑_{j=1}^{m}(r_{ijt}\cdot k_{j}\cdot alpha_{jtl}) The parameters are defined as follows: a_{jtl} is the unknown source effect for source j, time t, location l q_{s(i)} is the unknown type effect for type i in group s. x_{ij} is the known number of positive samples for each source j typei combination n_{ij} is the known total number of samples for each source j type i combination k_{j} is the fixed prevalence in source (i.e. the number of positive samples divided by the number of negative samples) j r_{ijt} is the unknown relative occurrence of type i on source j. Priors r_{.jt}\sim Dirichlet(a\_r_{1jt},..., a\_r_{njt}) a_{tl}\sim Dirichlet(a\_alpha_{1tl},..., a\_alpha_{mtl}) q\sim DP(a_q, Gamma(a_{theta},b_{theta})) ## Author(s) Chris Jewell and Poppy Miller p.miller at lancaster.ac.uk ## References Chen, M.-H. and Shao, Q.-M. (1998). Monte Carlo estimation of Bayesian credible and HPD intervals, Journal of Computational and Graphical Statistics, 7. Liu Y, Gelman A, Zheng T (2015). Simulation-efficient shortest probability intervals, Statistics and Computing. ## Examples   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 #### Format data using Y, X, and Prev functions ############################# ## Input data must be in long format y <- Y( # Cases data = sim_SA$cases, y = "Human", type = "Type", time = "Time", location = "Location" ) x <- X( # Sources data = sim_SA$sources, x = "Count", type = "Type", time = "Time", source = "Source" ) k <- Prev( # Prevalences data = sim_SA$prev, prev = "Value", time = "Time", source = "Source" ) #### Create Dirichlet(1) priors ############################################# ## Create alpha prior data frame prior_alpha_long <- expand.grid( Source = unique(sim_SA$sources$Source), Time = unique(sim_SA$sources$Time), Location = unique(sim_SA$cases$Location), Alpha = 1 ) # Use the Alpha() constructor to specify alpha prior prior_alpha <- Alpha( data = prior_alpha_long, alpha = 'Alpha', source = 'Source', time = 'Time', location = 'Location' ) ## Create r prior data frame prior_r_long <- expand.grid( Type = unique(sim_SA$sources$Type), Source = unique(sim_SA$sources$Source), Time = unique(sim_SA$sources$Time), Value = 0.1 ) # Use X() constructor to specify r prior prior_r <- X( data = prior_r_long, x = 'Value', type = 'Type', time = 'Time', source = 'Source' ) ## Pack all priors into a list priors <- list( a_theta = 0.01, b_theta = 0.00001, a_alpha = prior_alpha, a_r = prior_r ) ## If all prior values are the same, they can be specified in shorthand ## Equivalent result to the longform priors specified above priors <- list( a_theta = 0.01, b_theta = 0.00001, a_alpha = 1, a_r = 0.1 ) #### Set initial values (optional) ########################################## types <- unique(sim_SA$cases$Type) q_long <- data.frame(q=rep(15, length(types)), Type=types) init_q <- Q(q_long, q = 'q', type = 'Type') inits <- list(q = init_q) # Pack starting values into a list #### Construct model ######################################################## my_model <- HaldDP(y = y, x = x, k = k, priors = priors, inits = inits, a_q = 0.1) #### Set mcmc parameters #################################################### my_model$mcmc_params(n_iter = 2, burn_in = 2, thin = 1) #### Update model ########################################################### my_model$update() ## Add an additional 10 iterations my_model$update(n_iter = 2, append = TRUE) #### Extract posterior ###################################################### ## returns the posterior for the r, alpha, q, c, ## lambda_i, xi and xi_prop parameters, ## for all times, locations, sources and types ## the posterior is returned as a list or arrays ## Not run: str(my_model$extract()) ## returns the posterior for the r and alpha parameters, ## for time 1, location B, sources Source3, and Source4, ## types 5, 25, and 50, and iterations 200:300 ## the posterior is returned as a list of dataframes ## Not run: str(my_model$extract(params = c("r", "alpha"), times = "1", location = "B", sources = c("Source3", "Source4"), types = c("5", "25", "50"), iters = 5:15, flatten = TRUE)) ## End(Not run) #### Calculate medians and credible intervals ############################### ## Not run: my_model$summary(alpha = 0.05, CI_type = "chen-shao") ## subsetting is done in the same way as extract() ## Not run: my_model$summary(alpha = 0.05, CI_type = "chen-shao", params = c("r", "alpha"), times = "1", location = "B", sources = c("Source3", "Source4"), types = c("5", "25", "50"), iters = 5:15, flatten = TRUE) ## End(Not run) #### Plot heatmap and dendrogram of the type effect grouping ################ my_model$plot_heatmap() #### Extract data, initial values, prior values, acceptance ## rates for the mcmc algorithm and mcmc parameters my_model$get_data() my_model$get_inits() my_model$get_priors() my_model$get_acceptance() my_model\$get_mcmc_params() 

sourceR documentation built on Aug. 31, 2020, 5:06 p.m.