growth_rate | R Documentation |
Estimates the growth rate of a signal at given points along the underlying sequence. Several methodologies are available; see the growth rate vignette for examples.
growth_rate(
y,
x = seq_along(y),
x0 = x,
method = c("rel_change", "linear_reg", "smooth_spline", "trend_filter"),
h = 7,
log_scale = FALSE,
na_rm = FALSE,
params = growth_rate_params()
)
y |
Signal values. |
x |
Design points corresponding to the signal values |
x0 |
Points at which we should estimate the growth rate. Must be a
contained in the range of |
method |
Either "rel_change", "linear_reg", "smooth_spline", or "trend_filter", indicating the method to use for the growth rate calculation. The first two are local methods: they are run in a sliding fashion over the sequence (in order to estimate derivatives and hence growth rates); the latter two are global methods: they are run once over the entire sequence. See details for more explanation. |
h |
Bandwidth for the sliding window, when |
log_scale |
Should growth rates be estimated using the parametrization
on the log scale? See details for an explanation. Default is |
na_rm |
Should missing values be removed before the computation? Default
is |
params |
Additional arguments to pass to the method used to estimate the
derivative. This should be created with |
The growth rate of a function f defined over a continuously-valued parameter t is defined as f'(t) / f(t), where f'(t) is the derivative of f at t. To estimate the growth rate of a signal in discrete-time (which can be thought of as evaluations or discretizations of an underlying function in continuous-time), we can therefore estimate the derivative and divide by the signal value itself (or possibly a smoothed version of the signal value).
The following methods are available for estimating the growth rate:
"rel_change": uses (B/A - 1) / h, where B is the average of y
over the
second half of a sliding window of bandwidth h centered at the reference
point x0
, and A the average over the first half. This can be seen as
using a first-difference approximation to the derivative.
"linear_reg": uses the slope from a linear regression of y
on x
over a
sliding window centered at the reference point x0
, divided by the fitted
value from this linear regression at x0
.
"smooth_spline": uses the estimated derivative at x0
from a smoothing
spline fit to x
and y
, via stats::smooth.spline()
, divided by the
fitted value of the spline at x0
.
"trend_filter": uses the estimated derivative at x0
from polynomial trend
filtering (a discrete spline) fit to x
and y
, via
trendfilter::trendfilter()
, divided by the fitted value of the discrete
spline at x0
. This method requires the
{trendfilter}
package
to be installed.
An alternative view for the growth rate of a function f in general is given
by defining g(t) = log(f(t)), and then observing that g'(t) = f'(t) /
f(t). Therefore, any method that estimates the derivative can be simply
applied to the log of the signal of interest, and in this light, each
method above ("rel_change", "linear_reg", "smooth_spline", and
"trend_filter") has a log scale analog, which can be used by setting
log_scale = TRUE
.
For the local methods, "rel_change" and "linear_reg", we use a sliding window
centered at the reference point of bandiwidth h
. In other words, the
sliding window consists of all points in x
whose distance to the
reference point is at most h
. Note that the unit for this distance is
implicitly defined by the x
variable; for example, if x
is a vector of
Date
objects, h = 7
, and the reference point is January 7, then the
sliding window contains all data in between January 1 and 14 (matching the
behavior of epi_slide()
with before = h - 1
and after = h
).
For the global methods, "smooth_spline" and "trend_filter", additional
arguments can be specified via params
for the underlying estimation
function. These additional arguments are
passed to stats::smooth.spline()
, trendfilter::trendfilter()
, or
trendfilter::cv_trendfilter()
. The defaults are exactly
as specified in those functions, except when those defaults conflict
among these functions. These cases are as follows:
df
: desired effective degrees of freedom. For "smooth_spline", this must be numeric (or NULL
) and will
be passed along to the underlying function. For "trend_filter", if
cv = FALSE
, then df
must be a positive number (integer is most sensible);
if cv = TRUE
, then df
must be one of "min" or "1se" indicating the
selection rule to use
based on the cross-validation error curve: minimum or 1-standard-error
rule, respectively. The default is "min" (going along with the default
cv = TRUE
).
lambda
: For "smooth_spline", this should be a scalar value or NULL
.
For "trend_filter", this is allowed to also be a vector, as long as either
cv = TRUE
or df
is specified.
cv
: should cross-validation be used to choose an effective degrees of
freedom for the fit? The default is FALSE
to match stats::smooth.spline()
.
In that case, as in that function, GCV is used instead.
For "trend_filter", this will be coerced to TRUE
if neither
df
nor lambda
are specified (the default).
Note that passing both df
and a scalar lambda
will always be an error.
Vector of growth rate estimates at the specified points x0
.
# COVID cases growth rate by state using default method relative change
cases_deaths_subset %>%
group_by(geo_value) %>%
mutate(cases_gr = growth_rate(x = time_value, y = cases))
# Degree 3 polynomial and 5-fold cross validation on the log scale
# some locations report 0 cases, so we replace these with 1
cases_deaths_subset %>%
group_by(geo_value) %>%
mutate(gr_poly = growth_rate(
x = time_value, y = pmax(cases, 1), method = "trend_filter",
log_scale = TRUE, na_rm = TRUE
))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.