knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Fitting recent data only and using pre-calculated fits on distant past data

Introduction

When following an epidemic frequently and for a long period of time, it may not be efficient to fit the model to the whole time series of observations (e.g., reported incidence, wastewater concentrations). Instead, we may want to split the time series into two segments: one representing "distant" past data and another that represent "recent" data. The fit for the distant segment has already been performed and there is no need to re-run it every time. Put simply, it's been done once and for all. However, we may want to pay more attention to the recent segment because, its fit is more variable has new data come in. This approach may be well adapted when forecasting (for example).

This vignette shows how to use the function fit_recent() for this purpose.

Data setup

First, let's simulate data to generate an epidemic that will be convenient to fit.

library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)
library(wem)
packageVersion('wem')
set.seed(1234)

# Modify the parameters to get an epidemic with a desired shape:
init.prm = model_prm_example()
init.prm$pop.size <- 1e6
init.prm$horizon  <- 400
init.prm$R0       <- 2
init.prm$transm.t <- c(0, 50,  100, 170, 300)
init.prm$transm.v <- c(1, 0.5, 1, 0.7,  1.3)
init.prm$vacc.rate.t <- c(50,60,100)
init.prm$vacc.rate.v <- c(0,0.005,0.005) / 1.3

# Simulate the epidemic that will be 
# the basis of our simulated data:
s = simul(init.prm)
ts = s$ts

# Add noise to mimic real data:
ts2 = generate_obs_noise(df = ts, prms = list(cv=0.1, cv.ww=0.2))

# Build the data object:
data = build_data(
  cases = ts2 %>% 
    mutate(date = ymd('2020-01-01')+time) %>%
    select(date, report.obs) %>% 
    rename(value = report.obs),
  hosp = ts2 %>% 
    mutate(date = ymd('2020-01-01')+time) %>%
    select(date, hosp.admission.obs) %>% 
    rename(value = hosp.admission.obs),
  ww = ts2 %>% 
    mutate(date = ymd('2020-01-01')+time) %>%
    select(date, WWreport.obs) %>% 
    rename(value = WWreport.obs),
  hosp.type = 'hosp.adm',
  case.date.type = 'report')

# Plot the simulated data 
g.simdat = ts2 %>% 
  select(time, report.obs, WWreport.obs) %>% 
  pivot_longer(-time) %>% 
  ggplot(aes(x=time, y=value)) + 
  geom_line() + 
  facet_wrap(~name, scales = 'free_y')
plot(g.simdat)

Fitting the "distant past"

Now, let's fit the segment of data that we consider as the distant past, that is a fit that is done once and for all, and will not be updated later on.

Let's take arbitrarily the time point t=170 that is the data 2020-06-19 as the date that splits what we consider distant past and recent. This choice is driven by the fact that one of the times where the transmission rate changes in our simulated data is 170 (see init.prm in the Data Setup section above).

For the sake of computing speed for this vignette, the number of ABC iterations is set very low (300); in practice, this should be orders of magnitude higher.

# Determine necessary parameters for ABC fitting procedure 
prm.abc = define_abc_prms(iter        = 3e2,
                          accept      = 5e-2,
                          case.weight = 1.0,
                          ww.weight   = 1.0,
                          hosp.weight = 0.0,
                          hosp.type = 'hosp.adm')

# Define priors for fitting 
priors.past = data.frame(
  name = c('R0', 
           'ww.scale', 
           'transm.v_2',
           'transm.v_3'), # <--- the 3rd element of `transm.t` is time=170, our pivot date.
  distrib = c('rnorm', 'rnorm', 'rnorm', 'rnorm'),
  prms = c('1.9;0.2', 
           '0.0003; 0.0001', 
           '0.6; 0.2',
           '1.1; 0.3')
)

# Number of CPU cores used for parallel computing when fitting:
n.cores = parallel::detectCores() - 1

# Fit "past" data to the model
fitobj.past = wem::fit(
  data      = data,
  prm.abc   = prm.abc,
  prm       = init.prm,
  df.priors = priors.past,
  n.cores   = n.cores, 
  last.date = ymd('2020-06-19'))

# Plot fitting results, for the "past" only:
g.past = plot_fit_result(fitobj = fitobj.past)
plot(g.past$fitted.observation + scale_x_date(limits = c(ymd('2020-01-01', '2020-06-19'))))
# plot(g.past$posterior.dist)

The plot above shows how the fit performed for the "distant past" segment only (i.e., until 2020-06-19).

Fitting "recent" data

Now, let's fit only the "recent" data. That is, we want to fit model parameters that are relevant for recent data. Here we'll choose to fit the value of the transmission rate at time t=300 and define its prior the usual way:

priors.recent = data.frame(
  name = c('transm.v_4'),
  distrib = c('rnorm'),
  prms = c('1.2; 0.3')
)

The object fitobj.past (the fit object on distant past data calculated above) is used because the distant past is not fitted once again. Those posteriors for the past are now used as priors when combined with the priors for the parameters to be fitted for the recent segment (here transm.v_4). The function fit_recent() implements this:

fitobj.recent = fit_recent(
  data        = data, 
  prm.abc     = prm.abc, 
  fitobj.past = fitobj.past, # <--- fit on "distant past" data
  df.priors   = priors.recent, 
  prm         = init.prm, 
  n.cores     = n.cores, 
  last.date   = NULL
)

g.recent = plot_fit_result(fitobj = fitobj.recent)
plot(g.recent$fitted.observation + ggtitle('RECENT'))
plot(g.recent$posterior.dist+ ggtitle('RECENT'))

The plot shows the fit performed on the recent data segment only, but using the "distant past" information from the other fitted object fitobj.past into the priors for the parameters associated with the recent data segment.



phac-nml-phrsd/wem documentation built on June 6, 2024, 11:06 p.m.