In dshemetov/modeltools-mirror: Tools for Building COVID Forecasters and Related Models

This package provides tools for building COVID-19 models, with a focus on forecasting and hotspot prediction models. It is designed to be used in conjunction with the covidcast and evalcast packages.

Installing

This package is not on CRAN yet, so it can be installed using the devtools package:

devtools::install_github("cmu-delphi/covidcast", ref = "main",
                         subdir = "R-packages/modeltools")

Building the vignettes, such as this getting started guide, takes a significant amount of time. They are not included in the package by default. If you want to include vignettes, then use this modified command:

devtools::install_github("cmu-delphi/covidcast", ref = "main",
                         subdir = "R-packages/modeltools",
                         build_vignettes = TRUE, dependencies = TRUE)

For this getting started vignette, we'll fetch COVID-19 case rates, both raw and smoothed via 7-day trailing averages, from the USAFacts data source. We'll look at data at the state level for four states, between the start of June and mid November.

library(covidcast)

start_day <- "2020-06-01"
end_day <- "2020-11-15"
geo_values <- c("ca", "fl", "ny", "tx")

signals <- suppressMessages(
  covidcast_signals(data_source = "usa-facts", 
                    signal = c("confirmed_incidence_prop",
                               "confirmed_7dav_incidence_prop"),
                    start_day = start_day, end_day = end_day, 
                    geo_type = "state", geo_values = geo_values))

summary(signals[[1]])
summary(signals[[2]])

Slide with a formula

One of the most basic tools in the modeltools package is slide_by_geo(), which is based on the family of functions provided by the slider package. In modeltools, to "slide" means to apply the function or formula over a trailing window of n days of data, grouped by geo_value. Many other functions in the package, such as pct_change() and estimate_deriv(), use slide_by_geo() as their workhorse.

For example, to apply a 7-day trailing average to the raw values in the first covidcast_signal data frame, we can specify a formula via the slide_fun argument of slide_by_geo():

library(modeltools)
library(dplyr)

slide_by_geo(signals[[1]], slide_fun = ~ Mean(.x$value), n = 7) %>%
  select(geo_value, time_value, value, slide_value) %>%
  arrange(geo_value) %>% 
  head(10)

The formula specified via slide_fun has access to all columns present in the original covidcast_signal data frame, and must refer to them with the prefix .x$. Here the function Mean() is a simple wrapper around mean() that omits NA values by default (provided by the modeltools package).

Notice that slide_by_geo() returns a data frame with a new column appended that contains the results (from sliding the formula), named "slide_value" by default. We can instead specify a name up front using the col_name argument:

slide_by_geo(signals[[1]], slide_fun = ~ Mean(.x$value), n = 7, 
             col_name = "7dav") %>%
  select(geo_value, time_value, value, `7dav`) %>%
  arrange(geo_value) %>% 
  head(10)

As a simple sanity check, we compare the 7-day trailing average computed via slide_by_geo() to the values of the smoothed signal from the API:

slide_by_geo(signals[[1]], slide_fun = ~ Mean(.x$value), n = 7, 
             col_name = "7dav") %>%
  full_join(signals[[2]], by = c("geo_value", "time_value")) %>%
  mutate(difference = `7dav` - value.y) %>%
  filter(time_value >= "2020-06-07") %>%
  summarize(max(abs(difference)))

Note that this check would fail before June 7 because the API access to data before June 1, whereas slide_by_geo() does not (when a trailing window of n days isn't available, slide_by_geo() just applies the function or formula to whatever data is available).

Slide with a function

We can also pass a function for the slide_fun argument in slide_by_geo(). In this case, the passed function must have the following argument structure: x, a data frame the same column names as the original data frame; followed by any number of named additional arguments; and ending with ..., to capture general additional arguments. Recreating the last example of a 7-day trailing average:

slide_by_geo(signals[[1]], slide_fun = function(x, ...) Mean(x$value), n = 7,
             col_name = "7dav") %>%
  select(geo_value, time_value, value, `7dav`) %>%
  arrange(geo_value) %>% 
  head(10)

As a more sophisticated example, here we show how to sensorize the doctor visits signal. The sensor values are defined by the predicted values from a local (in time) regression of past case rates (response) on past doctor visits (covariate).

# Regression for doctor visits sensorization
dv_regression = function(x, m = 3, ...) {
  n = nrow(x)
  if (n <= m+1) return(NA) # Take care of trivial case

  return(tryCatch(suppressWarnings(suppressMessages({
    # Fit a regression, leaving out the last m days of data
    lm_obj = lm(value.y ~ value.x, data = x[1:(n-m+1), ])

    # Form prediction for the last day of data, and return
    predict(lm_obj, newdata = x[n, ])
  })),
  error = function(e) return(NA)))
}

# Fetch doctor visits signal and join it to case rates
joined = suppressMessages(
  covidcast_signal(data_source = "doctor-visits", 
                   signal = "smoothed_adj_cli", 
                   start_day = start_day, end_day = end_day, 
                   geo_type = "state", geo_values = geo_values)) %>%
  full_join(signals[[2]], by = c("geo_value", "time_value"))

# Perform sensorization for each state; use the last n = 56 days (8 weeks) of 
# data, minus the last m = 3 days, for the regression
slide_by_geo(joined, slide_fun = dv_regression, n = 56, m = 3, 
             col_name = "sensor") %>%
  select(geo_value, time_value, value.x, value.y, sensor) %>%
  arrange(geo_value) %>% 
  head(10)

Above, the first 4 elements are NA because of insufficient training data (we need at least 2 training samples for simple linear regression, and we omit the last 3 days of data).

Returning complex objects

As a final example, we show how to use slide_by_geo() to output more complex objects. We amend the last example so that dv_regression() returns both the predicted value (sensor) and the fitted linear model object. In order to use this with slide_by_geo(), we need to set the col_type argument to be "list":

dv_regression = function(x, m = 3, ...) {
  n = nrow(x)
  if (n <= m+1) return(list(lm_obj = NA, sensor = NA)) # Trivial case

  return(tryCatch(suppressWarnings(suppressMessages({
    # Fit a regression, leaving out the last days of data
    n = nrow(x)
    lm_obj = lm(value.y ~ value.x, data = x[1:(n-m+1), ])

    # Form prediction for the last day of data
    sensor = predict(lm_obj, newdata = x[n, ])

    # Return the fitted lm object and prediction
    list(lm_obj = lm_obj, sensor = sensor)
  })),
  error = function(e) return(list(lm_obj = NA, sensor = NA))))
}

joined <- slide_by_geo(joined, slide_fun = dv_regression, n = 56, m = 3, 
                       col_name = "sensor_obj", col_type = "list")

class(joined$sensor_obj)
names(joined$sensor_obj[[5]]) # The first 4 are lists filled with NAs
joined$sensor_obj[[5]]$lm_obj
joined$sensor_obj[[5]]$sensor

This allows for post-inspection of the sensor models, which may be helpful for diagnostic purposes. Note the way in which we structure the return argument in dv_regression() in its failure cases: it is always a list with the same named arguments. This helps keep the post-inspection code just a bit more simple (it is already a hairy enough, to protect against extracting coefficients from NA objects).

joined %>% 
  rowwise() %>%
  mutate(sensor = sensor_obj$sensor, 
         intercept = ifelse(isTRUE(is.na(sensor_obj$lm_obj)), NA,
                            coef(sensor_obj$lm_obj)[1]),
         slope = ifelse(isTRUE(is.na(sensor_obj$lm_obj)), NA, 
                        coef(sensor_obj$lm_obj)[2])) %>%
  select(geo_value, time_value, sensor, intercept, slope) %>%
  arrange(geo_value) %>% 
  head(20)

dshemetov/modeltools-mirror documentation built on Jan. 7, 2022, 12:23 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dshemetov/modeltools-mirror
Tools for Building COVID Forecasters and Related Models

In dshemetov/modeltools-mirror: Tools for Building COVID Forecasters and Related Models

Installing

Slide with a formula

Slide with a function

Returning complex objects

R Package Documentation

Browse R Packages

We want your feedback!

dshemetov/modeltools-mirror Tools for Building COVID Forecasters and Related Models

In dshemetov/modeltools-mirror: Tools for Building COVID Forecasters and Related Models

Installing

Slide with a formula

Slide with a function

Returning complex objects

R Package Documentation

Browse R Packages

We want your feedback!

dshemetov/modeltools-mirror
Tools for Building COVID Forecasters and Related Models