growth_rate: Estimate growth rate

View source: R/growth_rate.R

growth_rateR Documentation

Estimate growth rate

Description

Estimates the growth rate of a signal at given points along the underlying sequence. Several methodologies are available; see the growth rate vignette for examples.

Usage

growth_rate(
  x = seq_along(y),
  y,
  x0 = x,
  method = c("rel_change", "linear_reg", "smooth_spline", "trend_filter"),
  h = 7,
  log_scale = FALSE,
  dup_rm = FALSE,
  na_rm = FALSE,
  ...
)

Arguments

x

Design points corresponding to the signal values y. Default is seq_along(y) (that is, equally-spaced points from 1 to the length of y).

y

Signal values.

x0

Points at which we should estimate the growth rate. Must be a subset of x (no extrapolation allowed). Default is x.

method

Either "rel_change", "linear_reg", "smooth_spline", or "trend_filter", indicating the method to use for the growth rate calculation. The first two are local methods: they are run in a sliding fashion over the sequence (in order to estimate derivatives and hence growth rates); the latter two are global methods: they are run once over the entire sequence. See details for more explanation.

h

Bandwidth for the sliding window, when method is "rel_change" or "linear_reg". See details for more explanation.

log_scale

Should growth rates be estimated using the parametrization on the log scale? See details for an explanation. Default is FALSE.

dup_rm

Should we check and remove duplicates in x (and corresponding elements of y) before the computation? Some methods might handle duplicate x values gracefully, whereas others might fail (either quietly or loudly). Default is FALSE.

na_rm

Should missing values be removed before the computation? Default is FALSE.

...

Additional arguments to pass to the method used to estimate the derivative.

Details

The growth rate of a function f defined over a continuously-valued parameter t is defined as f'(t) / f(t), where f'(t) is the derivative of f at t. To estimate the growth rate of a signal in discrete-time (which can be thought of as evaluations or discretizations of an underlying function in continuous-time), we can therefore estimate the derivative and divide by the signal value itself (or possibly a smoothed version of the signal value).

The following methods are available for estimating the growth rate:

  • "rel_change": uses (B/A - 1) / h, where B is the average of y over the second half of a sliding window of bandwidth h centered at the reference point x0, and A the average over the first half. This can be seen as using a first-difference approximation to the derivative.

  • "linear_reg": uses the slope from a linear regression of y on x over a sliding window centered at the reference point x0, divided by the fitted value from this linear regression at x0.

  • "smooth_spline": uses the estimated derivative at x0 from a smoothing spline fit to x and y, via stats::smooth.spline(), divided by the fitted value of the spline at x0.

  • "trend_filter": uses the estimated derivative at x0 from polynomial trend filtering (a discrete spline) fit to x and y, via genlasso::trendfilter(), divided by the fitted value of the discrete spline at x0.

Log Scale

An alternative view for the growth rate of a function f in general is given by defining g(t) = log(f(t)), and then observing that g'(t) = f'(t) / f(t). Therefore, any method that estimates the derivative can be simply applied to the log of the signal of interest, and in this light, each method above ("rel_change", "linear_reg", "smooth_spline", and "trend_filter") has a log scale analog, which can be used by setting log_scale = TRUE.

Sliding Windows

For the local methods, "rel_change" and "linear_reg", we use a sliding window centered at the reference point of bandiwidth h. In other words, the sliding window consists of all points in x whose distance to the reference point is at most h. Note that the unit for this distance is implicitly defined by the x variable; for example, if x is a vector of Date objects, h = 7, and the reference point is January 7, then the sliding window contains all data in between January 1 and 14 (matching the behavior of epi_slide() with before = h - 1 and after = h).

Additional Arguments

For the global methods, "smooth_spline" and "trend_filter", additional arguments can be specified via ... for the underlying estimation function. For the smoothing spline case, these additional arguments are passed directly to stats::smooth.spline() (and the defaults are exactly as in this function). The trend filtering case works a bit differently: here, a custom set of arguments is allowed (which are distributed internally to genlasso::trendfilter() and genlasso::cv.trendfilter()):

  • ord: order of piecewise polynomial for the trend filtering fit. Default is 3.

  • maxsteps: maximum number of steps to take in the solution path before terminating. Default is 1000.

  • cv: should cross-validation be used to choose an effective degrees of freedom for the fit? Default is TRUE.

  • k: number of folds if cross-validation is to be used. Default is 3.

  • df: desired effective degrees of freedom for the trend filtering fit. If cv = FALSE, then df must be a positive integer; if cv = TRUE, then df must be one of "min" or "1se" indicating the selection rule to use based on the cross-validation error curve: minimum or 1-standard-error rule, respectively. Default is "min" (going along with the default cv = TRUE). Note that if cv = FALSE, then we require df to be set by the user.

Value

Vector of growth rate estimates at the specified points x0.

Examples

# COVID cases growth rate by state using default method relative change
cases_deaths_subset %>%
  group_by(geo_value) %>%
  mutate(cases_gr = growth_rate(x = time_value, y = cases))

# Log scale, degree 4 polynomial and 6-fold cross validation
cases_deaths_subset %>%
  group_by(geo_value) %>%
  mutate(gr_poly = growth_rate(x = time_value, y = cases, log_scale = TRUE, ord = 4, k = 6))

cmu-delphi/epiprocess documentation built on Oct. 29, 2024, 5:37 p.m.