knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(EpiEstim) 
library(ggplot2)

The EpiEstim package has been extended to allow users to estimate the time varying reproduction number (R~t~) from temporally aggregated incidence data (Nash et al.). This approach reconstructs daily incidence from data supplied at any timescale. This vignette will take you through the different ways aggregated incidence data can be supplied, the additional parameters needed, and an example using aggregated UK COVID-19 data. Please also see the FAQs section.

## Input data

Incidence data

Many diseases are not reported on a daily basis. EpiEstim can now use incidence data that has been aggregated in multiple ways, e.g.:

Daily incidence data is reconstructed from aggregated data using an expectation-maximisation (EM) algorithm. There are three stages of the EM algorithm:

The daily incidence that is reconstructed after the final iteration of the EM algorithm is then used to estimate R~t~ using the same process as the original estimate_R() function, with sliding weekly time windows used as the default.

Aggregation windows

Aggregation windows can be specified using the parameter dt, and can be provided in one of three ways:

Serial interval distribution

The serial interval can be provided on a daily timescale (as usual), either as the mean and standard deviation (parametric distribution) or the full distribution (non-parametric distribution). See 'full_EpiEstim_vignette' for more details.

Estimate R~t~ from temporally aggregated incidence data

To estimate R~t~ from temporally aggregated incidence data, we simply use the estimate_R() function with two additional parameters required, dt and dt_out, and some optional parameters, recon_opt, iter, tol, and grid.

estimate_R() for aggregated data

estimate_R(incid = aggregated_incidence,
           dt = 7L,
           dt_out = 7L,
           recon_opt = "naive",
           iter = 10L,
           tol = 1e-6,
           grid = list(precision = 0.001, min = -1, max = 1),
           config = config,
           method = method)

There are three other optional parameters that can be modified, however, we recommend that the default values are used:

The SI distibution can be specified as normal using the method and config parameters (see the full_EpiEstim_vignette for more details).

# Examples

Estimate R~t~ from weekly COVID-19 data

This example will take you through a workflow using weekly incidence data for UK COVID-19 cases. (For detailed description of the data see Nash et al.)

incid <- readRDS("./aggregated_data/UK_covid_cases.rds")

Incidence

Let us say we have a vector of weekly incidence data for COVID-19 cases.

incid

We need to specify how the data is aggregated, which in this case, is by constant weekly aggregation windows. We do this by supplying dt with a single integer (7L).

dt <- 7L

We can take an estimate from the literature to specify a parametric SI with a mean of 6.3 days and a standard deviation of 4.2 days (Bi et al 2020).

mean_si <- 6.3
std_si <- 4.2
method <- "parametric_si"
config <- make_config(list(mean_si = mean_si,
                           std_si = std_si))

Estimate R~t~

Now that we have our aggregated incidence, our aggregation time window, and SI distribution, we can supply these to the estimate_R() function. We do not need to specify dt_out, iter, tol, or grid because we are going to use the default values.

output <- EpiEstim::estimate_R(incid = incid,
                               dt = dt,
                               recon_opt = "match",
                               method = method,
                               config = config)

The output consists of multiple elements, including the reconstructed daily incidence data:

head(output$I)

And the R~t~ estimates (in this case, using the default weekly sliding windows):

head(output$R)

In this example, you will notice that R~t~ estimation does not start until day 8. This is because the daily incidence data cannot be reconstructed, and R~t~ estimation cannot start, until the first day of the second aggregation window. The start of R~t~ estimation may also be delayed if incidence is too low, but this was not the case here.

Plot results

As normal, simply plot the full or partial output.

plot(output) # full output

plot(output, "incid") # Reconstructed daily incidence only
plot(output, "R") # Rt estimates only
plot(output, "SI") # SI estimates only

Convergence

Convergence is checked automatically to ensure that the final iteration of the reconstructed daily incidence does not differ from the previous iteration beyond a tolerance of 10$^{-6}$ by default. If convergence is not reached, a warning will be returned and the algorithm can be run with more iterations by modifying iter =. The tolerance threshold can also be modified using tol =.

# FAQs

In order to reconstruct daily incidence data, the method requires that R~t~ is estimated for each aggregation window in turn, which is translated into a growth rate and used to reconstruct daily incidence assuming exponential growth. As there is no past incidence data beyond the first aggregation window, R~t~ cannot be estimated and the daily incidence cannot be reconstructed until the first day of the second aggregation window.

Additionally, R~t~ estimation will not start until case numbers are sufficiently high.

Please also see the FAQ section in the main "EpiEstim Vignette".

References

Nash RK, Cori A, Nouvellet P. Estimating the epidemic reproduction number from temporally aggregated incidence data: a statistical modelling approach and software tool. medRxiv pre-print.

Bi Q, et al. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. Lancet. 2020.

Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences. 2007 Feb 22;274(1609):599–604.



annecori/EpiEstim documentation built on Oct. 14, 2023, 1:54 a.m.