knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5) in_pkgdown <- identical(Sys.getenv("IN_PKGDOWN"), "true")
We are experimenting with an ability to "restart" the simulation at a point in time based on the outcome of a MCMC run. The motivating reason to do this is in our sircovid
model where run an epidemiological model with a number of time varying parameters corresponding to interventions which leads to a number of "waves" of infections. We want to be able to restart the simulation from the trough between two waves so that we can avoid refitting the first peak and better capture changes in parameters as the epidemic unfolds.
To illustrate this problem and the solution we will use a much simpler model which does not show recurrent waves of infection, so we will contrive points to restart. This example picks up from the vignette("sir_models")
vignette.
incidence <- read.csv(system.file("sir_incidence.csv", package = "mcstate")) plot(cases ~ day, incidence, type = "o", xlab = "Day", ylab = "New cases", pch = 19)
history <- readRDS("sir_true_history.rds") history <- t(drop(history)) colnames(history) <- c("S", "I", "R", "cases") history <- data.frame(t = seq_len(nrow(history)) - 1L, history) matplot(history$t, history[c("S", "I", "R")], type = "l", lty = 1, xlab = "Day", ylab = "Variable")
sir <- dust::dust_example("sir") dt <- 0.25 data <- mcstate::particle_filter_data(incidence, time = "day", rate = 1 / dt)
A comparison function
compare <- function(state, observed, pars = NULL) { if (is.na(observed$cases)) { return(NULL) } exp_noise <- 1e6 incidence_modelled <- state[1, , drop = TRUE] incidence_observed <- observed$cases lambda <- incidence_modelled + rexp(n = length(incidence_modelled), rate = exp_noise) dpois(x = incidence_observed, lambda = lambda, log = TRUE) }
An index function for filtering the run state
index <- function(info) { list(run = 5L, state = 1:5) }
Parameter information for the pMCMC, including a roughly tuned kernel so that we get adequate mixing
proposal_kernel <- rbind(c(0.00057, 0.00034), c(0.00034, 0.00026)) pars <- mcstate::pmcmc_parameters$new( list(mcstate::pmcmc_parameter("beta", 0.2, min = 0, max = 1, prior = function(p) log(1e-10)), mcstate::pmcmc_parameter("gamma", 0.1, min = 0, max = 1, prior = function(p) log(1e-10))), proposal = proposal_kernel)
The basic control parameters that we'll use throughout in the runs below:
n_steps <- 500 n_particles <- 100 n_threads <- dust::dust_openmp_threads()
n_steps <- 50 n_particles <- 10 n_threads <- dust::dust_openmp_threads()
Run a pMCMC with r n_particles
particles for the full length of the data. This means that we are fitting to the entire series.
p1 <- mcstate::particle_filter$new(data, sir, n_particles, compare, index = index, n_threads = n_threads) control1 <- mcstate::pmcmc_control(n_steps, save_trajectories = TRUE, save_state = TRUE, save_restart = 40) res1 <- mcstate::pmcmc(pars, p1, control = control1)
Here's the trajectories showing the number of people infected
t <- 0:100 matplot(t, t(res1$trajectories$state[2, , ]), type = "l", lty = 1, col = "#00000011", xlab = "Day", ylab = "Infected") points(I ~ t, history, pch = 19, col = "blue")
And the daily incidence
matplot(t, t(res1$trajectories$state[5, , ]), type = "l", lty = 1, col = "#00000005", xlab = "Day", ylab = "Daily incidence") points(cases ~ day, incidence, col = "blue", pch = 19)
In the mcstate::pmcmc_control
call above we added save_restart = 40
; this means that we save the entire internal state of a single particle at time step 40 (this is the model time "day" time, not the model time step).
The restart state is a 3d array with dimensions corresponding to (1) model state, (2) MCMC sample, (3) restart time (a vector of times can be provided to save_restart
but here just one time was returned).
dim(res1$restart$state)
This sample includes variablility over the pmcmc run:
hist(res1$restart$state[2, , ], xlab = "Number infected", main = "Distribution of number infected over mcmc samples")
This matrix includes all information to restart the stochastic process and we can use this to run a pmcmc that starts the particle filter that runs over data starting from day 40. Note that this is not the same model as it does fit to the first 40 days of the epidemic so we expect parameters to differ (it's a bit like allowing a break in parameters at day = 40 except our initial parameter sets here do include both parts.
The key part is to create an initial
function for the particle function that will create suitable initial conditions. We do this by sampling n_particles
worth of particles out of this matrix. (The initial
function must conform to mcstate's interface so we accept info
and pars
here even though we ignore them). Most cases can use the mcstate::particle_filter_initial
to do this, as below.
initial <- mcstate::particle_filter_initial(res1$restart$state[, , 1])
Then subset the data that we will fit to and create a new particle filter
data2 <- data[data$day_start >= 40, ] p2 <- mcstate::particle_filter$new(data2, sir, n_particles, compare, index = index, initial = initial, n_threads = n_threads)
And run a pMCMC that fits parmeters to the second half of the epidemic
control2 <- mcstate::pmcmc_control(n_steps, save_trajectories = TRUE, save_state = TRUE) res2 <- mcstate::pmcmc(pars, p2, control = control2)
Plot of the number of people infected over time with the overall model (black) and the restarted model (red) with observed data (blue)
t <- 0:100 t2 <- 40:100 matplot(t, t(res1$trajectories$state[2, , ]), type = "l", lty = 1, col = "#00000005", xlab = "Day", ylab = "Infected") matlines(t2, t(res2$trajectories$state[2, , ]), lty = 1, col = "#ff000011") points(I ~ t, history, pch = 19, col = "blue")
Daily incidence, which is what the model actually fits to
matplot(t[-1], diff(t(res1$trajectories$state[4, , ])), type = "l", lty = 1, col = "#00000002", xlab = "Day", ylab = "Daily incidence") matlines(t2[-1], diff(t(res2$trajectories$state[4, , ])), lty = 1, col = "#ff000005") points(cases ~ day, incidence, col = "blue", pch = 19)
The parameter estimates, which are a similar but not identical distribution
plot(res1$pars, xlim = range(res1$pars[, 1], res2$pars[, 1]), ylim = range(res1$pars[, 2], res2$pars[, 2]), pch = 19, col = "#0000ff55") points(res2$pars, col = "#ff000055", pch = 19)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.