knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "80%", fig.height = 3, fig.align = "center" ) library(tidyverse) theme_set(theme_bw()) theme_update( #axis.text.x = ggplot2::element_text(size = 25), text = element_text(size = 15) )
The PDSIR
R package implements an efficient data augmentation MCMC (DA-MCMC) algorithm for exact Bayesian inference under the semi-Markov stochastic susceptible-infectious-removed (SIR) model, given discretely observed counts of infections. The novelty of this DA-MCMC algorithm is the joint update of the high-dimensional latent data. In a Metropolis-Hastings step, the latent data are jointly proposed from a surrogate process carefully designed to closely resemble the target process and from which we can efficiently generate epidemics consistent with the observed data. This yields a MCMC algorithm that explores the high-dimensional latent space efficiently, mixes significantly better than single-site samplers, and scales to outbreaks with thousands of infections.
The package also contains data from the 2013-2015 outbreak of Ebola in Western Africa.
You can install the development version of PDSIR from GitHub with:
# install.packages("devtools") devtools::install_github("rmorsomme/PDSIR")
We start by generating artificial data from a semi-Markov SIR process with Weibull-distributed infection periods. We stop the process at time t_end=4
, when the outbreak is not completely over $I(t_{end})=4$.
library(PDSIR) # Setup set.seed(1) S0 <- 150 # initial number of susceptible individuals I0 <- 5 # initial number of infectious individuals t_end <- 4 # end of observation period iota_dist <- "weibull" # distribution of the infection periods theta <- list( R0 = 2, # basic reproduction number lambda = 1, shape = 2 # parameters of the Weibull distribution for the infection periods ) # Simulate artificial data SIR <- simulate_SEM(S0, I0, t_end, theta, iota_dist) # Trajectories of compartments draw_trajectories(SIR, t_end)
The observed data consist the number of infections in pre-specified intervals. here, we consider K=10
intervals of equal length.
K <- 10 # number of observation intervals Y <- observed_data(SIR, K) print(Y$ts ) # endpoints of the intervals print(Y$I_k) # number of infections per interval
We run the DA-MCMC algorithm for N=50,000
iterations using the (default) weakly informative prior $\beta \sim G(0.01, 1)$ and $\lambda \sim Ga(1,1)$ independently. The entire latent data are updated each iteration (rho=1
, by default). The sampler is fast and achieves a healthy acceptance rate in the Metropolis-Hastings step for the latent data.
out <- run_DAMCMC(Y, N = 5e4, iota_dist = iota_dist, theta_0 = theta) print(out$run_time) # run time in seconds print(out$rate_accept) # acceptance rate in the Metropolis-Hastings step for the latent data
The traceplot for $R_0$ indicates that the chain mixes well. The true value is $R_0=2$ (red dotted line).
library(tidyverse) THETA <- out$theta %>% data.table::rbindlist() %>% tibble() %>% mutate(Iteration = 1:nrow(.)) THETA %>% ggplot(aes(x = Iteration, y = R0)) + geom_line() + geom_hline(yintercept = theta$R0, col = "red", linewidth = 1.5, linetype = 2) + labs(y = expression(R[0]))
We now turn to a case study of the \$2013\$-\$2015\$ outbreak of Ebola Haemorrhagic Fever in Western Africa. We consider the prefecture Gueckedou in Guinea, where a total of r sum(ebola_gueckedou$I_k)
infections were observed between December 2013 and May 2015. The observed data consist of the number of infections in each week, which are shown below.
tibble( time = ebola_gueckedou$dates, I_k = ebola_gueckedou$I_k ) %>% ggplot(aes(time, I_k)) + geom_col() + labs(x = "Time (weeks)", y = "Weekly counts of\ninfections")
# set up rm(list=ls()) set.seed(1) S0 <- 292e3 # https://en.wikipedia.org/wiki/Prefectures_of_Guinea I0 <- 5 t_end <- max(ebola_gueckedou$ts) ebola_gueckedou$S0 <- S0 ebola_gueckedou$I0 <- I0 ebola_gueckedou$t_end <- t_end iota_dist <- "weibull" theta_0 <- list(R0 = 1, lambda = 0.01, shape = 2) N <- 1e6 thin <- 1e2 # run MCMC out <- run_DAMCMC( ebola_gueckedou, N = N, rho = 0.1, iota_dist = iota_dist, thin = thin, theta_0 = theta_0 ) print(out$run_time) # seconds print(out$rate_accept)
THETA <- out$theta %>% data.table::rbindlist() %>% tibble() %>% mutate( iteration = (1:nrow(.))*thin, iteration_1000 = iteration / 1000, EIP = lambda^{-1/shape}*gamma(1+1/shape) # expectation of weibull distribution ) THETA %>% ggplot(aes(x = EIP)) + geom_histogram(aes(y = after_stat(density))) + labs(x = "Expection of infection period", y = "Density")
THETA %>% ggplot(aes(x = iteration_1000, y = EIP)) + geom_line() + labs(x = "Iteration (in 1,000)", y = "Expectation of\ninfection period")
The GitHub repository PDSIR-article contains the R
scripts for reproducing the analyses and figures present in the paper Exact Inference for Stochastic Epidemic Models via Uniformly Ergodic Block Sampling by R. Morsomme and J. Xu available on ArXiv.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.