In dshemetov/modeltools-mirror: Tools for Building COVID Forecasters and Related Models

In this vignette, we'll take a look at estimating derivatives of signals using the estimate_deriv() function. We'll again demonstrate this functionality on state-level COVID-19 case rates, smoothed via 7-day trailing averages, from the USAFacts data source.

library(covidcast)

start_day <- "2020-06-01"
end_day <- "2020-11-15"
geo_values <- c("ca", "fl", "ny", "tx")

case_rates <- suppressMessages(
  covidcast_signal(data_source = "usa-facts", 
                   signal = "confirmed_7dav_incidence_prop",
                   start_day = start_day, end_day = end_day, 
                   geo_type = "state", geo_values = geo_values))

summary(case_rates)

Estimating derivatives via linear regression

The function for estimating derivatives is called estimate_deriv(), and (aside from a covidcast_signal data frame) takes two primary arguments: method, indicating the method to use for derivative estimation; and n indicating the trailing sample size (number of days) to use in training the given method. Here we use method = "lin", the default, which uses the slope from a simple linear regression, and n = 14, also the default.

library(modeltools)
library(dplyr)

case_rates <- estimate_deriv(case_rates, method = "lin", n = 14)

case_rates %>% 
  arrange(geo_value) %>% 
  select(geo_value, time_value, value, deriv)

We can see that a column deriv has been added to the output data frame, which contains the derivative estimates. Below we visualize these estimates in tandem with the signal itself. The red dots mark time points at which the derivative estimate exceeds a threshold (arbitrarily chosen) of 0.25. These seem to roughly but reasonably mark times of upswing in the underlying signal.

library(ggplot2)
library(gridExtra)

state = "fl"
threshold = 0.25

p1 <- ggplot(case_rates %>% filter(geo_value == state),
             aes(x = time_value, y = value)) +
  geom_line() + 
  geom_point(data = case_rates %>% 
               filter(geo_value == state, deriv >= threshold),
             aes(x = time_value, y = value), color = "red") + 
  labs(x = "Date", y = "Cases per 100,00 people") 

p2 <- ggplot(case_rates %>% filter(geo_value == state),
             aes(x = time_value, y = deriv)) +
  geom_line() + 
  geom_hline(yintercept = threshold, linetype = 2) +
  labs(x = "Date", y = "Derivative (linear regression)") 

grid.arrange(p1, p2, nrow = 1)

Estimating derivatives via smoothing spline

Now we consider method = "ss", which uses a smoothing spline for the estimate of the derivative. That is, at each time point, we fit a natural cubic spline to the data from the trailing n days, and return the derivative of the underlying fitted spline at the current time as the estimate. Here we set n = 28, a bit higher sample size, and fit the spline in two ways: first, using a fixed degrees of freedom of 8; and second, using cross-validation to choose the amount of regularization (tuning parameter). This is accomplished by passing additional arguments to estimate_deriv(), which are in turn passed on to the underlying function it uses to fit smoothing splines, stats::smooth.spline(). Note that we also set a custom name for the output column with the estimated derivatives, via the col_name argument.

case_rates <- estimate_deriv(case_rates, method = "ss", n = 28,
                             col_name = "deriv_ss1", df = 8)

case_rates <- estimate_deriv(case_rates, method = "ss", n = 28,
                             col_name = "deriv_ss2", cv = TRUE)

p1 <- ggplot(case_rates %>% filter(geo_value == state), 
             aes(x = time_value, y = value)) +
  geom_line() + 
  geom_point(data = case_rates %>% 
               filter(geo_value == state, deriv_ss1 >= threshold),
             aes(x = time_value, y = value), color = "red") + 
  geom_point(data = case_rates %>% 
               filter(geo_value == state, deriv_ss2 >= threshold),
             aes(x = time_value, y = value), color = "blue", shape = 21) +
  labs(x = "Date", y = "Cases per 100,00 people") 

p2 <- ggplot(case_rates %>% filter(geo_value == state), 
             aes(x = time_value)) +
  geom_line(aes(y = deriv_ss1), color = "red") + 
  geom_line(aes(y = deriv_ss2), color = "blue") + 
  geom_hline(yintercept = threshold, linetype = 2) +
  labs(x = "Date", y = "Derivative (smoothing spline)") 

grid.arrange(p1, p2, nrow = 1)

The estimated derivates---in red for the smoothing spline with a fixed degrees of freedom of 8, and in blue for that tuned by cross-validation---appear less smooth than those above, from linear regression. Using cross-validation offers more adaptivity to the time-varying level of smoothness, as is apparent from comparing the red and blue derivative estimates in October and November.

Estimating derivatives via trend filtering

Lastly we consider method = tf", which uses trend filtering for estimating the derivative. That is, at each time point, we fit a discrete spline of quadratic order to the data from the trailing n days, and return the discrete derivative of the underlying fitted spline at the current time as the estimate. As before, we fit the spline in two ways: first, using a fixed degrees of freedom of 8; and second, using cross-validation to choose the amount of regularization. Since the optimization here takes a while (it's based on computing a full solution path
for the trend filtering problem, via the genlasso::trendfilter() function), we only compute derivatives for Florida.

case_rates_state <- case_rates %>% filter(geo_value == state)

case_rates_state <- estimate_deriv(case_rates_state, method = "tf", n = 28, 
                                   col_name = "deriv_tf1", df = 8)

case_rates_state <- estimate_deriv(case_rates_state, method = "tf", n = 28, 
                                   col_name = "deriv_tf2", cv = TRUE)

p1 <- ggplot(case_rates_state, aes(x = time_value, y = value)) +
  geom_line() + 
  geom_point(data = case_rates_state %>% filter(deriv_tf1 >= threshold),
             aes(x = time_value, y = value), color = "red") + 
  geom_point(data = case_rates_state %>% filter(deriv_tf2 >= threshold),
             aes(x = time_value, y = value), color = "blue", shape = 21) +
  labs(x = "Date", y = "Cases per 100,00 people") 

p2 <- ggplot(case_rates_state, aes(x = time_value)) +
  geom_line(aes(y = deriv_tf1), color = "red") + 
  geom_line(aes(y = deriv_tf2), color = "blue") + 
  geom_hline(yintercept = threshold, linetype = 2) +
  labs(x = "Date", y = "Derivative (trend filtering)") 

grid.arrange(p1, p2, nrow = 1)

The estimated derivates now appear a bit smoother than the last ones, from the smoothing spline methods. Again, using cross-validation offers a noticeable improvement in adapting to to the time-varying level of smoothness, as is very clear from the differences between red and blue derivative estimates in October and November.

Post-hoc inspection

In the call to estimate_deriv(), we can set keep_obj = TRUE to keep around a second column with the fitted model objects. For example, here, we can look at the p-values associated with the estimated slopes from lsfit().

case_rates <- estimate_deriv(case_rates, method = "lin", n = 14, 
                             keep_obj = TRUE)

class(case_rates$deriv_obj)
ls.print(case_rates$deriv_obj[[7]])

case_rates <- case_rates %>%
  rowwise() %>%
  mutate(p_value = quiet(
    tryCatch(ls.print(deriv_obj)$coef.table[[1]][2,"Pr(>|t|)"], 
             error = function(e) NA))) 

case_rates %>% 
  arrange(geo_value) %>% 
  select(geo_value, time_value, value, deriv, deriv_obj, p_value)

dshemetov/modeltools-mirror documentation built on Jan. 7, 2022, 12:23 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dshemetov/modeltools-mirror
Tools for Building COVID Forecasters and Related Models

In dshemetov/modeltools-mirror: Tools for Building COVID Forecasters and Related Models

Estimating derivatives via linear regression

Estimating derivatives via smoothing spline

Estimating derivatives via trend filtering

Post-hoc inspection

R Package Documentation

Browse R Packages

We want your feedback!

dshemetov/modeltools-mirror Tools for Building COVID Forecasters and Related Models

In dshemetov/modeltools-mirror: Tools for Building COVID Forecasters and Related Models

Estimating derivatives via linear regression

Estimating derivatives via smoothing spline

Estimating derivatives via trend filtering

Post-hoc inspection

R Package Documentation

Browse R Packages

We want your feedback!

dshemetov/modeltools-mirror
Tools for Building COVID Forecasters and Related Models