knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.asp = 0.62, fig.width = 5 ) options(future.rng.onMisuse = "ignore")
With longer time series, you may wish to split the model fitting into "blocks" for speed. This approach will let you fit a model once for the time series until a recent date (say a couple months ago), and then focus on fitting the more recent data separately, perhaps with full MCMC sampling given that the fitting will be considerably faster.
The following is an example of splitting the dataset in late May. You can split the data wherever and as frequently as you want providing the following is true:
You must also be OK with various otherwise fixed parameters (phi, R0, e, i0) being allowed to vary across blocks. The posterior from the fit will be used as the prior for the next. This means the information from the first block will be preserved, but the next block posterior may be different from the previous when additional data are added. Visually, the most obvious effect might be the NB2 dispersion parameter phi: the credible intervals may appear smaller or wider in different splits. One approach to deal with this is to allow for some overlap (say a week) between two fits and then average across them, as is done in the following example.
library(covidseir) library(dplyr) library(ggplot2) ymd <- lubridate::ymd
First, we will read in the British Columbia, Canada COVID-19 reported-case data. This comes from http://www.bccdc.ca/Health-Info-Site/Documents/BCCDC_COVID19_Dashboard_Case_Details.csv.
dat <- structure(list(value = c(1, 1, 4, 6, 4, 6, 1, 3, 5, 11, 20, 12, 25, 12, 27, 46, 41, 60, 47, 62, 46, 43, 88, 59, 91, 62, 62, 37, 30, 82, 56, 47, 55, 31, 21, 13, 51, 40, 33, 34, 41, 28, 15, 30, 44, 14, 51, 25, 30, 15, 34, 63, 33, 37, 72, 61, 15, 58, 41, 26, 29, 23, 33, 21, 16, 24, 29, 21, 14, 8, 17, 10, 13, 16, 14, 21, 8, 11, 5, 19, 11, 22, 12, 4, 9, 8, 7, 10, 5, 11, 9, 16, 4, 23, 2, 4, 12, 7, 9, 10, 14, 12, 17, 14, 14, 9, 11, 19, 9, 5, 9, 6, 17, 11, 12, 21, 12, 9, 13, 3, 12, 17, 10, 9, 12, 16, 7, 10, 19, 18, 22, 25, 23, 20, 13, 19, 24, 33, 44, 19, 29, 33, 34, 32, 30, 30, 21, 21, 24, 47, 26, 39, 52, 29, 48, 28, 43, 46, 47, 51, 38, 42, 46, 86, 73, 90, 100, 86, 51, 77, 65, 84, 90, 115, 79, 81), day = 1:176), row.names = c(NA, -176L), class = "data.frame") dat <- dplyr::as_tibble(dat) dat$date <- ymd("2020-03-01") + dat$day - 1
ggplot(dat, aes(date, value)) + geom_line()
See the 'fitting-case-data' vignette for details on the model setup. Here, we will use the same initial code blocks.
# Based on estimation with hospital data in other model: samp_frac <- c(rep(0.14, 13), rep(0.21, 38)) samp_frac <- c(samp_frac, rep(0.37, nrow(dat) - length(samp_frac)))
f_seg <- c(0, rep(1, nrow(dat) - 1)) day_new_f <- which(dat$date == ymd("2020-05-01")) f_seg[seq(day_new_f, length(f_seg))] <- 2 day_ch <- which(dat$date == ymd("2020-06-01")) f_seg[seq(day_ch, length(f_seg))] <- 3
first_day <- min(dat$date) lut <- dplyr::tibble( day = seq_len(300), date = seq(first_day, first_day + length(day) - 1, by = "1 day") )
First, let's split the data into two blocks with a week overlap in the middle. Note that f is assumed constant in the 30 days prior. You could also avoid the overlap if you would prefer.
dat1 <- filter(dat, date < ymd("2020-05-29")) dat2 <- filter(dat, date >= ymd("2020-05-22")) samp_frac1 <- samp_frac[seq_len(nrow(dat1))] samp_frac2 <- samp_frac[dat2$day] f_seg1 <- f_seg[seq_len(nrow(dat1))] f_seg2 <- f_seg[dat2$day]
Now we need to adjust the f_seg
indexes to start at 1 in the second block and set the first element to 0:
f_seg2 <- f_seg2 - min(f_seg2) + 1 f_seg2[1] <- 0 f_seg2
Fit the first model:
fit1 <- fit_seir( daily_cases = dat1$value, samp_frac_fixed = samp_frac1, f_seg = f_seg1, i0_prior = c(log(8), 1), e_prior = c(0.8, 0.05), start_decline_prior = c(log(15), 0.1), end_decline_prior = c(log(22), 0.1), f_prior = cbind(c(0.4, 0.5), c(0.2, 0.2)), R0_prior = c(log(2.6), 0.2), N_pop = 5.1e6, # BC population iter = 200, fit_type = "optimizing" # for speed only )
Use the function covidseir::post2prior()
to convert the posteriors to the priors. The output is a named list:
priors <- post2prior(fit1, iter = seq_len(50)) priors
Now we will fit the second model. Critically, note that we need to set use_ramp = FALSE
to avoid using the initial social distancing ramp. The start_decline_prior
and end_decline_prior
are given fake priors that are close to 1 and easy to sample from. They can be ignored in the output.
fit2 <- fit_seir( daily_cases = dat2$value, samp_frac_fixed = samp_frac2, f_seg = f_seg2, i0_prior = priors$i0, e_prior = priors$e, start_decline_prior = priors$start_decline, end_decline_prior = priors$end_decline, state_0 = priors$state_0, pars = c(D = 5, k1 = 1/5, k2 = 1, q = 0.05, ud = 0.1, ur = 0.02, f0 = priors$f[1]), f_prior = cbind(c(0.6, 0.6), c(0.2, 0.2)), # or use priors$f to use the last f segment posterior R0_prior = priors$R0, use_ramp = FALSE, # CRITICAL for restarting! N_pop = 5.1e6, # BC population iter = 200, fit_type = "optimizing", # for speed only phi_prior = priors$phi )
Since we need to average over the overlap, we will combine the posteriors before turning them into a data frame with quantiles:
proj1 <- project_seir(fit1, iter = seq_len(50)) proj2 <- project_seir(fit2, iter = seq_len(50)) proj2$day <- proj2$day + min(dat2$day) - 1
These are the fits from the two models:
proj1 %>% tidy_seir() %>% plot_projection(dat1) proj2 %>% tidy_seir() %>% plot_projection(dat2)
Now we can bind the two projections together, convert them into quantiles, join on our day-date lookup table (lut
), and plot the result. The vertical dashed lines indicate the region of overlapping/split model fits.
tidy_proj_combined <- bind_rows(proj1, proj2) %>% tidy_seir(resample_y_rep = 50) %>% left_join(lut, by = "day") tidy_proj_combined %>% plot_projection(obs_dat = dat, value_column = "value", date_column = "date") + geom_vline(xintercept = ymd("2020-05-22"), lty = 2) + geom_vline(xintercept = ymd("2020-05-29"), lty = 2) + theme_light() + theme(axis.title.x = element_blank()) + coord_cartesian(ylim = c(0, 150), expand = FALSE)
In this example you will notice that phi is estimated as larger (less dispersion) in the second block compared to the first, resulting in slightly smaller credible intervals than we would have had with a single fit:
mean(fit1$post$phi) mean(fit2$post$phi)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.