knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(kableExtra)
library(hrbrthemes)
library(gghighlight)
library(MCMCglmm)
library(projections)
library(EpiEstim) 
library(ggplot2)
library(incidence)
library(dplyr)
library(gridExtra)
library(readxl)

The EpiEstim package has been extended to allow users to estimate the effective transmission advantage of new pathogens/variants/strains compared to a reference pathogen/variant/strain in real-time. This vignette will take you through the key stages of estimating this advantage and also provide an example of how this method can be applied in a typical outbreak analysis workflow. Please also see the 'FAQs' section.

## Input data

The incidence data needs to be supplied in the form of a multidimensional array, containing the incidence for each day of the outbreak (row), location (column) and variant/strain under investigation (third dimension), e.g.:

t <- c("[1,]","[2,]","[3,]","[4,]","...")
Region_1 <- c(34,21,58,37,"...")
Region_2 <- c(102,111,78,150,"...")
Region_3 <- c(58,31,72,85,"...")
Region_x <- c(62,48,104,72,"...")
Region_y <- c(201,230,146,298,"...")
Region_z <- c(104,59,91,150,"...")

df <- data.frame(t,Region_1,Region_2,Region_3)
df2 <- data.frame(t,Region_x,Region_y,Region_z)

knitr::kables(list(kable(df, col.names = c(" ","Region_1","Region_2","Region_3"), caption="Variant 1"),
                   kable(df2, col.names = c(" ","Region_1","Region_2","Region_3"), caption="Variant 2")))

The serial interval distributions must be supplied as a matrix with the number of columns matching the number of variants being considered. Each column should contain the probability mass function (PMF) for the discrete serial interval of each variant, starting with the PMF for day zero in the first row (which should be 0) and each column should sum to 1, e.g.:

t <- c("[1,]","[2,]","[3,]","[4,]","...")
Variant_1 <- c(0,0.1,0.3,0.5,"...")
Variant_2 <- c(0,0.2,0.4,0.6,"...")

df <- data.frame(t,Variant_1,Variant_2)

knitr::kable(df, col.names = c(" ","Variant_1","Variant_2"))

Estimate advantage

To estimate the effective transmission advantage of new variants compared to a reference variant we use the estimate_advantage() function in EpiEstim.

estimate_advantage()

estimate_advantage(incid = incidence_object, 
                   si_distr = si_matrix, 
                   priors = default_priors(),
                   mcmc_control = default_mcmc_controls(),
                   t_min = NULL, t_max = nrow(incid),
                   incid_imported = NULL,
                   precompute = TRUE,
                   reorder_incid = TRUE)

As described above, the incid= and si_distr= arguments should be supplied with a multidimensional array of incidence and a matrix containing the serial interval for each variant respectively. In addition, the user can supply:

Other optional parameters include:

# Examples

SARS-CoV-2 variants

This example will take you through a workflow using incidence data for two variants of SARS-CoV-2 (wildtype and alpha) across 7 NHS regions in England. (For detailed description of the data see Bhatia et al. 2021)

incid <- readRDS("./data_mv_vignette/incid.rds")

Incidence

Let us say we have incidence data in the format below, where each row corresponds to a time step in the outbreak, each column corresponds to a region in England, and each third dimension corresponds to a variant.

head(incid)

We also assume that the SI is the same for both variants, with a mean of 5.4 days and a standard deviation of 1.5 days (Rai, Shukla, and Dwivedi 2021) and produce a matrix of SI's with the number of columns equal to the number of variants:

mean_SI <- 5.4
sd_SI <- 1.5
SI <- EpiEstim::discr_si(seq(0, 20), mean_SI, sd_SI)
si_matrix <- cbind(SI,SI)
head(si_matrix)

Estimate transmission advantage

Now that we have our incidence and SI distributions we can supply these to the estimate_advantage() function. We use the default priors and MCMC controls.

output <- EpiEstim::estimate_advantage(incid = incid,
                             si_distr = si_matrix,
                             priors = default_priors(),
                             mcmc_control = default_mcmc_controls())

The output is a list containing 4 elements:

output$epsilon
output$R

In this example, we did not specify t_min= or t_max=. You will notice that the R estimates do not start until day 25, this is because: 1) One of the regions doesn't have cases of the alpha variant until day 16, 2) The 95th percentile of the SI is 9 days.

output$convergence
output$diag

Summarise results

To summarise the estimated transmission advantage one can use the posterior median and 95% CrI as follows.

quantile(output$epsilon[1,], c(0.5,0.025,0.975))
# FAQs

The inference framework jointly estimates the instantaneous reproduction number of the reference variant and the effective transmission advantage of new variants. R estimates here should be interpreted with caution because they represent estimates from the joint distribution of the reproduction number and transmission advantage, therefore depending on the incidence of the reference as well as the new variant(s). Moreover, temporal variation in estimates of the instantaneous reproduction number can arise from a number of factors, such as the implementation of control measures, changes in population behaviour, or variability in the reporting of cases over time. If only interested in estimating the reference R~t~, we recommend using estimate_R().

"The Gelman-Rubin algorithm suggests the MCMC may not have converged within the number of iterations specified."

This means that the MCMC chains for the estimation have not converged. To avoid this you may need to run estimate_advantage() with more MCMC iterations. E.g. by altering mcmc_control().

"Input SI distributions should sum to 1. Normalising now"

The SI distributions supplied have been re-normalised so that they sum to 1.



annecori/EpiEstim documentation built on Oct. 14, 2023, 1:54 a.m.