knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(kableExtra) library(hrbrthemes) library(MCMCglmm) library(projections) library(EpiEstim) library(ggplot2) library(incidence) library(dplyr) library(gridExtra) library(readxl)
The EpiEstim
package has been extended to allow users to estimate the effective transmission advantage of new pathogens/variants/strains compared to a reference pathogen/variant/strain in real-time. This vignette will take you through the key stages of estimating this advantage and also provide an example of how this method can be applied in a typical outbreak analysis workflow. Please also see the 'FAQs' section.
The incidence data needs to be supplied in the form of a multidimensional array, containing the incidence for each day of the outbreak (row), location (column) and variant/strain under investigation (third dimension), e.g.:
t <- c("[1,]","[2,]","[3,]","[4,]","...") Region_1 <- c(34,21,58,37,"...") Region_2 <- c(102,111,78,150,"...") Region_3 <- c(58,31,72,85,"...") Region_x <- c(62,48,104,72,"...") Region_y <- c(201,230,146,298,"...") Region_z <- c(104,59,91,150,"...") df <- data.frame(t,Region_1,Region_2,Region_3) df2 <- data.frame(t,Region_x,Region_y,Region_z) knitr::kables(list(kable(df, col.names = c(" ","Region_1","Region_2","Region_3"), caption="Variant 1"), kable(df2, col.names = c(" ","Region_1","Region_2","Region_3"), caption="Variant 2")))
The serial interval distributions must be supplied as a matrix with the number of columns matching the number of variants being considered. Each column should contain the probability mass function (PMF) for the discrete serial interval of each variant, starting with the PMF for day zero in the first row (which should be 0) and each column should sum to 1, e.g.:
t <- c("[1,]","[2,]","[3,]","[4,]","...") Variant_1 <- c(0,0.1,0.3,0.5,"...") Variant_2 <- c(0,0.2,0.4,0.6,"...") df <- data.frame(t,Variant_1,Variant_2) knitr::kable(df, col.names = c(" ","Variant_1","Variant_2"))
To estimate the effective transmission advantage of new variants compared to a reference variant we use the estimate_advantage()
function in EpiEstim
.
estimate_advantage()
estimate_advantage(incid = incidence_object, si_distr = si_matrix, priors = default_priors(), mcmc_control = default_mcmc_controls(), t_min = NULL, t_max = nrow(incid), incid_imported = NULL, precompute = TRUE, reorder_incid = TRUE)
As described above, the incid=
and si_distr=
arguments should be supplied with a multidimensional array of incidence and a matrix containing the serial interval for each variant respectively. In addition, the user can supply:
priors=
with a list of the prior parameters (shape and scale) for the reproduction number and the transmission advantage. The default priors can be obtained using the default_priors()
function.
mcmc_control=
with a list of the default properties of the MCMC, i.e. the number of iterations, burn-in and the thinning of the MCMC chains. Burn-in is the number of iterations to be discarded as "warm-up", for instance, if burnin=10 the output is only recorded after 10 iterations have run. Thinning determines how much the MCMC chains should be thinned out, if thin = 10 then 1 in every 10 iterations will be kept. The default parameters can be obtained using default_mcmc_controls()
.
t_min=
and t_max=
with integers that give the minimum and maximum time step over which the transmission advantage will be estimated. Note that cases before t_min
will still be accounted for as potential infectors in the likelihood. Indeed, cases before t_min
can contribute to the overall infectiousness from t_min to t_max through the serial interval distribution. t_min
must be >1 and if t_min = NULL
then this is automatically calculated as the maximum of the 95th percentile of the SI distribution. t_max
must be > t_min
and <= the number of rows of the incidence data. The default value of t_max
is nrow(incid)
.
Other optional parameters include:
incid_imported=
where a multidimensional array can be supplied (row = time, column = location, third dimension = variant) containing the incidence of imported cases, such that incid - incid_imported
would be the incidence of locally infected cases. If incid_imported = NULL
it will be assumed that (other than in the first time step) all cases arose from local transmission.
precompute=
which can be TRUE
or FALSE
, but defaults to TRUE
. This determines whether the shape of the posterior distributions for R and the transmission advantage (for the non-reference variant(s)) for each time step and location should be precalculated. This makes estimate_advantage()
faster, only use precompute=FALSE
for debugging.
reorder_incid=
which can be TRUE
or FALSE
, but defaults to TRUE
. If TRUE
then during the estimation of the transmission advantage the incidence array will be temporarily re-ordered so that all variants are compared to the most transmissible variant (temporarily considered the reference) regardless of the order in which the user supplied the data. Note that the results will always be returned in the order supplied by the user. We recommend that this is set to TRUE
.
This example will take you through a workflow using incidence data for two variants of SARS-CoV-2 (wildtype and alpha) across 7 NHS regions in England. (For detailed description of the data see Bhatia et al. 2021)
incid <- readRDS("./data_mv_vignette/incid.rds")
Incidence
Let us say we have incidence data in the format below, where each row corresponds to a time step in the outbreak, each column corresponds to a region in England, and each third dimension corresponds to a variant.
head(incid)
We also assume that the SI is the same for both variants, with a mean of 5.4 days and a standard deviation of 1.5 days (Rai, Shukla, and Dwivedi 2021) and produce a matrix of SI's with the number of columns equal to the number of variants:
mean_SI <- 5.4 sd_SI <- 1.5 SI <- EpiEstim::discr_si(seq(0, 20), mean_SI, sd_SI) si_matrix <- cbind(SI,SI)
head(si_matrix)
Estimate transmission advantage
Now that we have our incidence and SI distributions we can supply these to the estimate_advantage()
function. We use the default priors and MCMC controls.
output <- EpiEstim::estimate_advantage(incid = incid, si_distr = si_matrix, priors = default_priors(), mcmc_control = default_mcmc_controls())
The output is a list containing 4 elements:
output$epsilon
output$R
In this example, we did not specify t_min=
or t_max=
. You will notice that the R estimates do not start until day 25, this is because:
1) One of the regions doesn't have cases of the alpha variant until day 16,
2) The 95th percentile of the SI is 9 days.
output$convergence
output$diag
coda
). The length of diag is equal to the number of rows in epsilon (i.e. the number of non-reference variants, in this case only 1). Each element of diag contains a named list of the point estimate and upper confidence limits. Summarise results
To summarise the estimated transmission advantage one can use the posterior median and 95% CrI as follows.
quantile(output$epsilon[1,], c(0.5,0.025,0.975))
estimate_advantage()
be interpreted with caution?The inference framework jointly estimates the instantaneous reproduction number of the reference variant and the effective transmission advantage of new variants. R estimates here should be interpreted with caution because they represent estimates from the joint distribution of the reproduction number and transmission advantage, therefore depending on the incidence of the reference as well as the new variant(s). Moreover, temporal variation in estimates of the instantaneous reproduction number can arise from a number of factors, such as the implementation of control measures, changes in population behaviour, or variability in the reporting of cases over time. If only interested in estimating the reference R~t~, we recommend using estimate_R()
.
"The Gelman-Rubin algorithm suggests the MCMC may not have converged within the number of iterations specified."
This means that the MCMC chains for the estimation have not converged. To avoid this you may need to run estimate_advantage()
with more MCMC iterations. E.g. by altering mcmc_control()
.
"Input SI distributions should sum to 1. Normalising now"
The SI distributions supplied have been re-normalised so that they sum to 1.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.