In davidchampredon/gitest: Generation intervals

The GI package implements the theoretical results published in:

[1] Champredon D, Dushoff J. Intrinsic and realized generation intervals in infectious-disease transmission. Proceedings of the Royal Society B: Biological Sciences (2015); 282: 20152026. open access link

Summary of theoretical framework

The generation interval is the interval between the time when an individual is infected by an infector and the time when this infector was infected. There are three conceptually different generation interval distributions:

intrinsic generation interval: defines the transmission process in the mathematical model
forward generation interval: observed when a cohort is tracked forward in time to see whom individuals infect
backward generation interval: observed when a cohort is investigated by looking backward in time to see who infected each individual. This generation interval is important when fitting a model to contact tracing data.

For a classical compartmental model (like the SIR model), the generation interval distributions are not explicitly defined and calculating them is not straightforward.

Similarly, when using a renewal equation to model an epidemic, although the intrinsic generation interval is explicitly defined, both the backward and forward generation intervals are not explicitly defined and calculating them is not straightforward.

The GI package provides easy-to-use functions to calculate the various generation interval distributions.

Epidemic models implemented

SEmInR

The SEmInR model is an extension of the popular SEIR model, where $m$ E ("exposed") compartments and $n$ I ("infectious") compartments were introduced artificially in order to have latent and infectious durations distributed more realistically. See [1] for more details on this model.

RESuDe

Although less popular than compartmental models, it is also possible to model infectious disease transmission with Lotka-Euler's renewal equation. If we note $S_t$ the number of susceptible at time $t$, $I_t$ the incidence at time $t$ (not the prevalence!), $\mathcal{R}0$ the basic reproduction number, $N$ the constant total population size and $g$ the _intrinsic generation interval distribution, we have: $$ I_t = \mathcal{R}0 \frac{S_t}{N}\sum{s=1}^t g(s)\, I_{t-s}$$ $$ S_t = S_{t-1} - I_t$$

Note that this model is defined at discrete times. The susceptible depletion given by the second equation prevent the model to have an exponential growth for ever. This feature defines the label of this model: Renewal Equation with Suceptible Depletion, or "RESuDe".

This model can be slightly more sophisticated and we can introduce:

mixing heterogeneity by adding a multiplicative factor $(S_t/N)^\alpha$ for $\alpha\geq 0$
intervention effects by adding a multiplicative factor that decreases the force of infection through time $e^{-\kappa t}$ for $\kappa\geq 0$

Hence, the first equation becomes (the second equation, susceptible depletion, is unchanged): $$ I_t = \mathcal{R}0 \left(\frac{S_t}{N}\right)^\alpha e^{-\kappa t} \sum{s=1}^t g(s)\, I_{t-s}$$

Example to visualize the temporal evolution of GI distributions

Example with SEmInR model

Let's start by defining the model parameters:

library(plyr)
library(GI)
latent_mean     <- 5  # Mean duration of latent period (in days)
infectious_mean <- 5  # Mean duration of infectious period (in days)
R0 <- 4
nE <- 3  # Number of artifical 'E' compartments
nI <- 3  # Number of artifical 'E' compartments
horizon <- 100  # make sure simulations last long enough, until the end of the epidemic.

Now, let's define the (calendar) times where we want to calculate the forward and backward generation interval distributions (the intrinsic one is constant througout the epidemic):

cal.times.fwdbck <- seq(5, horizon-10, by=5)

Let's call the GI.seminr function to calculate all three generation interval distributions at these calendar times:

g <- GI.seminr(latent_mean, infectious_mean, R0, nE,nI, cal.times.fwdbck,horizon)

The results are stored in list, so lets retrieve them:

g.int <- g[['intrinsic']]
g.fwd <- g[['fwd']]
g.bck <- g[['bck']]
t     <- g[['cal.times']]
inc   <- g[['incidence']]

In order to understand the temporal evolution of the forward and backward generation interval distributions, let's make some plots. First, we plot the incidence time series generated by the model, as this is one of the fundamental component affecting forward and backward generation interval distributions.

plot(x=t, y=inc, typ='l', 
     main = "SEmInR simulated incidence",
     xlab = 'calendar time', ylab='', las=1,lwd=2) ; grid()

Now, let's look at the intrinsic generation interval distribution:

plot(g.int$tsi, g.int$density, 
     typ='l', col='red', lwd=3,
     xlim = c(0,30),
     xlab = 'time since infection', ylab ='',yaxt='n',
     main='Intrinsic generation interval distribution')

This intrinsic distribution will not change throughout the epidemic.

Now, let's look at the forward generation interval distributions, at each calendar times requested in cal.times.fwdbck.

plot(x=g.fwd$tsi[[1]], y = g.fwd$density[[1]],
     main = 'Forward generation interval distribution',
     xlab = 'time since infection', ylab ='',yaxt='n',
     xlim = c(0,30), col=rgb(0,0,0,0.2),
     typ='l',ylim=c(0,0.2))
for(i in 1:length(g.fwd$tsi)) 
    lines(x=g.fwd$tsi[[i]],g.fwd$density[[i]],col=rgb(0,0,0,0.2))

Each grey line represents the distribution at a different calendar times, we can see that it is different.

Apart from the fact that the distribution changes, it is not really clear to see a pattern. So let's plot the mean of the forward generation interval distribution throughout the epidemic (i.e. calendar times):

fbar <- g[['fwd.mean']]
plot(x=cal.times.fwdbck, y=fbar,
     typ='o', las = 1, 
     main = 'Mean of the forward generation interval distribution',
     xlab = 'calendar time', ylab ='')
grid()

Each circle is the mean of the distribution calculated at a given calendar time.

We can do exactly the same analysis for the backward generation interval distribution:

plot(x=g.bck$tsi[[length(g.bck$tsi)]], g.bck$density[[length(g.bck$tsi)]],
     main = 'Backward generation interval distribution',
     xlab = 'time since infection', ylab ='', yaxt='n',
     xlim = c(0,30),
     typ='l',ylim=c(0,0.2),col=rgb(0,0,0,0.2))
for(i in 1:length(g.bck$tsi)) 
    lines(x=g.bck$tsi[[i]],g.bck$density[[i]],col=rgb(0,0,0,0.2))

bbar <- g[['bck.mean']]
plot(cal.times.fwdbck,bbar,
     typ='o',las = 1,
     main = 'Mean of the backward generation interval distribution',
     xlab = 'calendar time', ylab ='')
grid()

Example with RESuDe model

The exact same analysis as the one done with a SEmInR can be performed with a RESuDe model. The only difference is the way to set up the RESuDe model and the function to call is now GI.resude (instead of GI.seminr). Also, because RESuDe is defined by explictly giving the intrinsic generation interval distribution (with the GI_type, GI_mean and GI_var parameters), only the forward and backward generation interval distribution are calculated in this case.

Here is the code to replicate with RESuDe what was done for SEmInR.

library(GI)

### Parameters for RESuDe model
R0 <- 4
horizon <- 100
I.init <- 1
pop_size <- 1e4
alpha <- 0
kappa <- 0
GI_type <- 'pois'  # 'pois' or 'nbinom'
GI_span <- 20
GI_mean <- 8
GI_var <- GI_mean + 2  # used only if GI_type='nbinom'

# Calendar times where the forward and 
# backward generation interval distributions
# will be calculated:
cal.times.fwdbck <- seq(GI_span-2, horizon - 4, by = 1)

# Calculate generation interval distributions:
G <- GI.resude(cal.times.fwdbck,
                R0,
                alpha,
                kappa,
                GI_span,
                GI_mean,
                GI_var,
                GI_type,
                horizon,
                pop_size = pop_size,
                I.init = I.init)

# Retrieve results:
gi <- G$intrinsic
gbck <- G$bck
gfwd <- G$fwd


###  PLOTS  ###

par(mfrow=c(1,1)) 

# Epidemic time series
plot(G$susceptible,
     xlab = 'calendar time', ylab='', 
     main='Number of susceptibles',
     typ='l')
grid()
plot(G$incidence,typ='l',
     xlab = 'calendar time', ylab='', 
     main = 'Incidence')
grid()

# GI Distributions:
plot(x=gi$tsi, y=gi$density,
     main = 'Backward GI distributions\n at requested calendar times',
     xlab = 'time since infection', ylab='',yaxt='n',
     ylim = c(0,1.2*max(gi$density)),
     pch=16)
for(i in 1:length(cal.times.fwdbck)){
    gb <- gbck$density[[i]]
    lines(x = gbck$tsi[1:length(gb)], y = gb, typ='l', col = rgb(0,0,1,0.2))
}
legend(x='topright',lwd = c(1,0),legend = c('backward','intrinsic'),pch=c(NA,16))

plot(x=gi$tsi, y=gi$density,
     main = 'Forward GI distributions\n at requested calendar times',
     xlab = 'time since infection', ylab='',yaxt='n',
     ylim = c(0,1.2*max(gi$density)),
     pch=16)
for(i in 1:length(cal.times.fwdbck)){
    lines(x = gfwd$tsi, y = gfwd$density[[i]], typ='l', col=rgb(0,.5,0,0.1))
}
legend('topright',lwd = c(1,0),legend = c('forward','intrinsic'),pch=c(NA,16))


plot(x=cal.times.fwdbck, y=G$fwd.mean, 
     xlim = c(0,horizon),
     main = 'Mean of the forward GI distribution\n at requested calendar times',
     xlab = 'calendar time', ylab='', 
     las = 1,
     typ='o')
abline(h=GI_mean, lty=2)
grid()
plot(x=cal.times.fwdbck, y=G$bck.mean, 
     xlim = c(0,horizon),
     main = 'Mean of the forward GI distribution\n at requested calendar times',
     xlab = 'calendar time', ylab='', 
     las = 1,
     typ='o')
abline(h=GI_mean, lty=2)
grid()

Contact-tracing data example

In an epidemic investigation of an on-going outbreak, contact tracing typically provides backward generation interval: an infectee was infected at time $t_1$ and her identified infector was infected at time $t_0$. The backward generation interval is simply $t_1-t_0$. Hence, for a mathematical model used to analyze this outbreak, if contact tracing data is available and the modeller is willing to fit this model to such data, it is important to fit the backward generation interval (not the intrinsic, as may happen with a naive approach).

Generating simulated data

First, we simulate data from a fully known model and add a layer of observation error on the backward GIs. Here we use a SEmInR model to generate the data:

set.seed(12345)

# Simulate 'true' data from a SEmInR model:
R0.true <- 3
dt <- 0.25
hz <- 100
nE <- nI <- 4
latent_mean <- 2
infectious_mean <- 4

# Calendar times when the GI were observed:
t.obs <- sort(sample(x = 1:(hz/2), 
                     size = 30, 
                     replace = TRUE))

# "True" model:
G.true <- GI.seminr(latent_mean = latent_mean,
                    infectious_mean = infectious_mean, 
                    R0 = R0.true, 
                    nE = nE, 
                    nI = nI,
                    cal.times.fwdbck = t.obs,
                    horizon = hz, 
                    dt = dt)

# Retrieve the mean backward GI from the "true" model:
gibck.true <-  G.true$bck.mean

# Imperfectly observed GI at the associated calendar times.
# Assume the ovservation error process is Poisson-distributed:
gi.obs <- rpois(n = length(gibck.true),lambda = gibck.true)
gi.obs[gi.obs==0] <- 1

After generating the data, we have the following "observed" backward GIs:

par(mfrow=c(1,1))
# plot(x=1:length(G.true$incidence) * dt, 
#      y=G.true$incidence, 
#      typ='l', xlab='day',ylab='incidence') ; grid()
plot(t.obs, gibck.true, 
     typ='l',
     lty = 2,
     las = 1,
     xlab = 'calendar time',
     ylab = 'Backward GI',
     ylim=range(c(0,gibck.true,gi.obs)))
grid()
points(x=t.obs, y=gi.obs, pch=16)
legend(x = 'topleft', 
       pch = c(NA,16), 
       lty = c(2,NA),
       legend = c('Simulated mean bckwd GI', 'Observed'))

The function gi_ct_fit provides a convienent way to fit the basic reproductive number ($R_0$) and the mean intrinsic generation interval ($\bar{g}$) of an epidemiological model to observed backward GIs. It is implemented to fit two types of epidemiological models, the SEmInR and RESuDe (both defined earlier in this vignette). The fit is done holding all other parameters of the model constant. Hence we need to define the fixed parameters, for each model:

# Model fixed parameters (not fitted):
fxd.prm.seminr <- list(horizon=hz, nE=nE, nI=nI, latent_mean=latent_mean, dt=dt)
fxd.prm.resude <- list(horizon=hz, alpha=0, kappa=0, GI_span=20, 
                       GI_var=NULL, GI_type='pois', dt=dt)

Next, we need to specify the range of values explored for $R_0$ and $\bar{g}$. The fitting algorithm is simply going to explore the parameter space defined, and construct a likelihood surface from which the minimum and the confidence intervals will be extracted.

R0.rng     <- seq(1, 6, by=0.25)
gimean.rng <- seq(2, 8, by=0.5)
CI <- 0.95
do.plot <- TRUE

# Run fit for both SEmInR and RESuDe models:
fit.seminr <- gi_ct_fit(t.obs = t.obs,
          gi.obs = gi.obs,
          model.epi = 'seminr',
          fxd.prm = fxd.prm.seminr,
          R0.rng = R0.rng,
          gimean.rng = gimean.rng,
          CI = CI,
          do.plot = do.plot)

fit.resude <- gi_ct_fit(t.obs = t.obs, 
          gi.obs = gi.obs, 
          model.epi = 'resude', 
          fxd.prm = fxd.prm.resude,
          R0.rng = R0.rng, 
          gimean.rng = gimean.rng,
          CI = CI,
          do.plot = do.plot)