knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6.5, fig.height = 6.5, fig.path = "forecasting_multiple_data_streams-" )
The EpiNow2
package contains functionality to make forecasts for multiple data streams that represent different epidemiological outcomes.
For example, it can be used to predict both the number of future cases (with symptoms) and severe cases (hospital admissions).
This vignette demonstrates how to do this on an example simulated data set.
For more information on the models used and the assumptions, look at the methods vignettes for estimate_infections and estimate_secondary.
We first load the EpiNow2 package.
We also load data.table
and ggplot2
which we'll use later for data manipulation and plotting.
Lastly, we set a seed for reproducibility.
library(EpiNow2) library(data.table) library(ggplot2) set.seed(6789)
To speed up compuation you can set the number of cores to use. We will want to run 4 MCMC chains in parallel so if this is available it would make sense to set:
options(mc.cores = 4)
If we had fewer than 4 available or wanted to run fewer than 4 chains (at the expense of some robustness), or had fewer than 4 computing cores available we could set it to that. To find out the number of cores available one can use the detectCores function from the parallel
package.
We first simulate a data set that we will use as an example.
Simulation code
cases <- example_confirmed[1:60]
We then generate another data set of hospitalisations, assuming that 10% of confirmed cases become hospitalised with a delay from case confirmation that is given by a lognormal distribution with mean of 7 days and standard deviation of 3 days.
admission_delay <- LogNormal(mean = 7, sd = 3) admission_scale <- Fixed(0.1)
We use the simulate_secondary
function to generate a time series of hospitalisations
## first, rename the column to what is expected by `simulate_secondary` primary <- cases[, list(date, primary = confirm)] ## generate hospitalisation data hosp <- simulate_secondary( primary, delays = delay_opts(admission_delay), obs = obs_opts(family = "poisson", scale = admission_scale) )
Using this we then create a combined data set:
df <- merge(primary, hosp, by = "date") setnames( df, c("date", "primary", "secondary"), c("date", "cases", "hospitalisations") )
The simulated data set looks like this:
head(df) df_long <- melt(df, id.vars = "date") ggplot(df_long, aes(x = date, y = value, colour = variable)) + geom_line() + xlab("Value") + ylab("Date") + theme_bw() + theme(legend.position = "bottom")
We first use one of the two data streams to estimate the number of infections and make a forecast. Usually this would be done on the outcome that is closest to infection, in our case the number of confirmed cases. In order to do so, we would usually specify a number generation time and any delays to the initial report. This should be done based on insight on the disease and reporting set up. Here we use some example parameters, the same as in the workflow vignette:
generation_time <- Gamma( shape = Normal(9, 2.5), rate = Normal(3, 1.4), max = 10 ) incubation_period <- LogNormal( meanlog = Normal(1.6, 0.05), sdlog = Normal(0.5, 0.05), max = 14 ) reporting_delay <- LogNormal(meanlog = 0.5, sdlog = 0.5, max = 10) combined_delays <- incubation_period + reporting_delay
We then estimate infections and produce a 2-week forecast.
est <- estimate_infections( df[, list(date, confirm = cases)], generation_time = gt_opts(generation_time), delay = delay_opts(combined_delays), forecast = forecast_opts(horizon = 14) )
We next use our data sets of cases and admissions to estimate the delay and scaling between the two (both of which assumed to be constant over time).
This is done using the estimate_secondary()
function.
sec <- estimate_secondary( df[, list(date, primary = cases, secondary = hospitalisations)], obs = obs_opts(scale = Normal(0, 1)) )
We can now combine our case forecast with the estimated delay and scaling to make forecasts of the number of hospitalisations.
In order to do so we use the forecast_secondary
function.
forecast <- forecast_secondary( sec, est )
We can now combine all our forecasts into one plot:
case_forecast <- est$summarised[ variable == "reported_cases" & type == "forecast" ][, variable := "cases"] admissions_forecast <- forecast$predictions[ is.na(secondary) ][, variable := "hospitalisations"] forecasts <- rbindlist(list(case_forecast, admissions_forecast), fill = TRUE) ggplot(df_long, aes(x = date, colour = variable, fill = variable)) + geom_line(aes(y = value)) + geom_ribbon( data = forecasts, aes(ymin = lower_20, ymax = upper_20), alpha = 0.8, colour = NA ) + geom_ribbon( data = forecasts, aes(ymin = lower_50, ymax = upper_50), alpha = 0.5, colour = NA ) + geom_ribbon( data = forecasts, aes(ymin = lower_90, ymax = upper_90), alpha = 0.2, colour = NA ) + xlab("Value") + ylab("Date") + theme_bw() + theme(legend.position = "bottom")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.