In rmorsomme/PDSIR: Data-augmentation Marko chain Monte Carlo for Fitting the Stochastic SIR Model to Incidence Counts

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "80%",
  fig.height = 3,
  fig.align = "center"
)

library(tidyverse)
theme_set(theme_bw())
theme_update(
  #axis.text.x = ggplot2::element_text(size = 25),
  text = element_text(size = 15)
  )

PDSIR

The PDSIR R package implements an efficient data augmentation MCMC (DA-MCMC) algorithm for exact Bayesian inference under the semi-Markov stochastic susceptible-infectious-removed (SIR) model, given discretely observed counts of infections. The novelty of this DA-MCMC algorithm is the joint update of the high-dimensional latent data. In a Metropolis-Hastings step, the latent data are jointly proposed from a surrogate process carefully designed to closely resemble the target process and from which we can efficiently generate epidemics consistent with the observed data. This yields a MCMC algorithm that explores the high-dimensional latent space efficiently, mixes significantly better than single-site samplers, and scales to outbreaks with thousands of infections.

The package also contains data from the 2013-2015 outbreak of Ebola in Western Africa.

Installation

You can install the development version of PDSIR from GitHub with:

# install.packages("devtools")
devtools::install_github("rmorsomme/PDSIR")

Illustration on synthetic data

We start by generating artificial data from a semi-Markov SIR process with Weibull-distributed infection periods. We stop the process at time t_end=4, when the outbreak is not completely over $I(t_{end})=4$.

library(PDSIR)

# Setup
set.seed(1)
S0 <- 150  # initial number of susceptible individuals
I0 <- 5    # initial number of infectious individuals

t_end <- 4 # end of observation period

iota_dist <- "weibull" # distribution of the infection periods
theta <- list(
  R0 = 2,               # basic reproduction number
  lambda = 1, shape = 2 # parameters of the Weibull distribution for the infection periods
  )


# Simulate artificial data
SIR <- simulate_SEM(S0, I0, t_end, theta, iota_dist)


# Trajectories of compartments
draw_trajectories(SIR, t_end)

The observed data consist the number of infections in pre-specified intervals. here, we consider K=10 intervals of equal length.

K <- 10 # number of observation intervals
Y <- observed_data(SIR, K)

print(Y$ts ) # endpoints of the intervals
print(Y$I_k) # number of infections per interval

We run the DA-MCMC algorithm for N=50,000 iterations using the (default) weakly informative prior $\beta \sim G(0.01, 1)$ and $\lambda \sim Ga(1,1)$ independently. The entire latent data are updated each iteration (rho=1, by default). The sampler is fast and achieves a healthy acceptance rate in the Metropolis-Hastings step for the latent data.

out <- run_DAMCMC(Y, N = 5e4, iota_dist = iota_dist, theta_0 = theta) 

print(out$run_time)    # run time in seconds
print(out$rate_accept) # acceptance rate in the Metropolis-Hastings step for the latent data

The traceplot for $R_0$ indicates that the chain mixes well. The true value is $R_0=2$ (red dotted line).

library(tidyverse)

THETA <- out$theta %>%
      data.table::rbindlist() %>%
      tibble() %>%
      mutate(Iteration = 1:nrow(.))

THETA %>%
  ggplot(aes(x = Iteration, y = R0)) +
  geom_line() +
  geom_hline(yintercept = theta$R0, col = "red", linewidth = 1.5, linetype = 2) +
  labs(y = expression(R[0]))

Illustration on the 2013-2015 outbreak of Ebola in Western Africa

We now turn to a case study of the \$2013\$-\$2015\$ outbreak of Ebola Haemorrhagic Fever in Western Africa. We consider the prefecture Gueckedou in Guinea, where a total of r sum(ebola_gueckedou$I_k) infections were observed between December 2013 and May 2015. The observed data consist of the number of infections in each week, which are shown below.

tibble(
  time = ebola_gueckedou$dates,
  I_k  = ebola_gueckedou$I_k
  ) %>%
  ggplot(aes(time, I_k)) +
  geom_col() +
  labs(x = "Time (weeks)", y = "Weekly counts of\ninfections")

# set up
rm(list=ls())
set.seed(1)

S0 <- 292e3 # https://en.wikipedia.org/wiki/Prefectures_of_Guinea
I0 <- 5
t_end <- max(ebola_gueckedou$ts)

ebola_gueckedou$S0 <- S0
ebola_gueckedou$I0 <- I0
ebola_gueckedou$t_end <- t_end

iota_dist <- "weibull"
theta_0 <- list(R0 = 1, lambda = 0.01, shape = 2)  

N    <- 1e6
thin <- 1e2

# run MCMC
out <- run_DAMCMC(
    ebola_gueckedou, N = N, rho = 0.1, iota_dist = iota_dist, thin = thin, theta_0 = theta_0
)


print(out$run_time) # seconds
print(out$rate_accept)

THETA <- out$theta %>%
      data.table::rbindlist() %>%
      tibble() %>%
      mutate(
        iteration = (1:nrow(.))*thin,
        iteration_1000 = iteration / 1000,
        EIP = lambda^{-1/shape}*gamma(1+1/shape) # expectation of weibull distribution
        )

THETA %>%
  ggplot(aes(x = EIP)) +
  geom_histogram(aes(y = after_stat(density))) +
  labs(x = "Expection of infection period", y = "Density")

THETA %>%
  ggplot(aes(x = iteration_1000, y = EIP)) +
  geom_line() +
  labs(x = "Iteration (in 1,000)", y = "Expectation of\ninfection period")

Reproducing the results in the paper Exact Inference for Stochastic Epidemic Models via Uniformly Ergodic Block Sampling

The GitHub repository PDSIR-article contains the R scripts for reproducing the analyses and figures present in the paper Exact Inference for Stochastic Epidemic Models via Uniformly Ergodic Block Sampling by R. Morsomme and J. Xu available on ArXiv.

rmorsomme/PDSIR documentation built on April 27, 2023, 2:56 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rmorsomme/PDSIR
Data-augmentation Marko chain Monte Carlo for Fitting the Stochastic SIR Model to Incidence Counts

In rmorsomme/PDSIR: Data-augmentation Marko chain Monte Carlo for Fitting the Stochastic SIR Model to Incidence Counts

PDSIR

Installation

Illustration on synthetic data

Illustration on the 2013-2015 outbreak of Ebola in Western Africa

Reproducing the results in the paper Exact Inference for Stochastic Epidemic Models via Uniformly Ergodic Block Sampling

R Package Documentation

Browse R Packages

We want your feedback!

rmorsomme/PDSIR Data-augmentation Marko chain Monte Carlo for Fitting the Stochastic SIR Model to Incidence Counts

In rmorsomme/PDSIR: Data-augmentation Marko chain Monte Carlo for Fitting the Stochastic SIR Model to Incidence Counts

PDSIR

Installation

Illustration on synthetic data

Illustration on the 2013-2015 outbreak of Ebola in Western Africa

Reproducing the results in the paper Exact Inference for Stochastic Epidemic Models via Uniformly Ergodic Block Sampling

R Package Documentation

Browse R Packages

We want your feedback!

rmorsomme/PDSIR
Data-augmentation Marko chain Monte Carlo for Fitting the Stochastic SIR Model to Incidence Counts