In signaturescience/focustools: Forecasting COVID-19 in the United States (FOCUS) tools

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Here we provide an example of how to use focustools to generate forecasts for COVID-19 Forecast Hub targets via time series approaches. First, load the focustools package and other ancillary packages.

library(focustools)
library(dplyr)
library(readr)
library(purrr)
library(fabletools)
library(fable)

Get data

First, let's retrieve COVID-19 data from the JHU source. Note that the get_cases() and get_deaths() functions are executed separately, then joined and converted to a tsibble. Note that granularity="state" would pull in data at the state and territory level.

## Get data at the national scale from JHU source
usac <-  get_cases(source="jhu",  granularity = "national")
usad <- get_deaths(source="jhu",  granularity = "national")

## Use the focustools helper to prep the tsibble format
usa <-  
  dplyr::inner_join(usac, usad, by = c("location", "epiyear", "epiweek")) %>% 
  make_tsibble(chop=TRUE)

## Pull from internal data
usa <- focustools:::usa_20210329

Fit time series models

With the data prepared, we can now fit the models. In this case, we'll fit two models. One will be an ARIMA of incident cases, with parameters automatically optimized. For more details see ?fable::ARIMA. In this case we will restrict the order of non-seasonal auto-regressive terms, order of integration for non-seasonal differencing, and order of non-seasonal moving average terms. The other model will use the three week lagged incident case counts to predict incident deaths:

## Fit incident case model
fits.icases  <- usa %>% model(arima = ARIMA(icases~PDQ(0,0,0)+pdq(1:2,0:2,0), stepwise=FALSE, approximation=FALSE))

## Fit incident death model based on cases three weeks ago
fits.ideaths <- usa %>% model(linear_caselag3 = TSLM(ideaths ~ lag(icases, 3)))

Generate forecasts

With the model fit objects created, we can generate forecasts (including point and quantile estimates) for the outcomes of interest at an arbitrary horizon (here, 4 weeks). Note that given the model we have chosen for the incident deaths outcome, we do need to make sure to generate the incident case forecast first so that we can retrieve point estimates for future cases, which will be passed into the incident deaths forecast:

## Generate incident case forecast
forc.icases <- ts_forecast(fits.icases, outcome = "icases", horizon = 4)

## Need to get future cases to pass to ideaths forecast
futr.icases <- ts_futurecases(usa, forc.icases, horizon = 4)

## Forecast incident deaths based on best guess for cases
forc.ideaths <- ts_forecast(fits.ideaths,  outcome = "ideaths", new_data = futr.icases)

We can use the forecasts of incident deaths to infer cumulative deaths. The ts_forecast() function is flexible enough to switch methods internally, requiring a "inc_forecast" parameter if either "cdeaths" or "ccases" is passed as the "outcome" for the forecast:

## Generate cumulative death forecast
forc.cdeaths <- ts_forecast(outcome = "cdeaths", .data = usa, inc_forecast = forc.ideaths)

Prepare for submission

Prior to submitting these forecasts to the COVID-19 Forecast Hub we need to prepare them in the canonical submission format. Data prep in format_for_submission includes naming targets appropriately and ensuring that target dates adhere to epidemiological week format:

## Create submission object
submission <-
  list(format_for_submission(forc.icases,  target_name = "inc case"),
         format_for_submission(forc.ideaths, target_name = "inc death"),
         format_for_submission(forc.cdeaths, target_name = "cum death")) %>%
  reduce(bind_rows) %>%
  arrange(target)

head(submission)
tail(submission)

Lastly, we can validate the submission. The object must be written to a file and then validated with validate_forecast(), which wraps a python function developed and maintained by the COVID-19 Forecast Hub organizers.

ffile <- file.path(tempdir(), paste0(Sys.Date(), "-focustools-ts.csv"))
submission %>% write_csv(ffile)
validate_forecast(ffile)

Forecast visualization and summary

We can use plot_forecast() to visualize these forecasts:

plot_forecast(.data=usa, submission=submission)

And we can use submission_summary() to produce a summary of the forecasts for each target (cumulative deaths, incident cases, incident deaths):

subsum <- submission_summary(.data=usa, submission=submission)
subsum

We could optionally coerce this structure into a table. For counts:

subsum$counts %>% 
  bind_rows(.id="target") %>% 
  knitr::kable()

Or percent difference:

subsum$perc_diff %>% 
  bind_rows(.id="target") %>% 
  knitr::kable()

signaturescience/focustools documentation built on July 14, 2022, 3:27 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

signaturescience/focustools
Forecasting COVID-19 in the United States (FOCUS) tools

In signaturescience/focustools: Forecasting COVID-19 in the United States (FOCUS) tools

Get data

Fit time series models

Generate forecasts

Prepare for submission

Forecast visualization and summary

R Package Documentation

Browse R Packages

We want your feedback!

signaturescience/focustools Forecasting COVID-19 in the United States (FOCUS) tools

In signaturescience/focustools: Forecasting COVID-19 in the United States (FOCUS) tools

Get data

Fit time series models

Generate forecasts

Prepare for submission

Forecast visualization and summary

R Package Documentation

Browse R Packages

We want your feedback!

signaturescience/focustools
Forecasting COVID-19 in the United States (FOCUS) tools