epi_slide_opt: Optimized slide functions for common cases
In cmu-delphi/epiprocess: Tools for basic signal processing in epidemiology

epi_slide_opt

R Documentation

Optimized slide functions for common cases

Description

epi_slide_opt allows sliding an n-timestep data.table::froll or slider::summary-slide function over variables in an epi_df object. These functions tend to be much faster than epi_slide(). See vignette("epi_df") for more examples.

epi_slide_mean is a wrapper around epi_slide_opt with .f = data.table::frollmean.

epi_slide_sum is a wrapper around epi_slide_opt with .f = data.table::frollsum.

Usage

epi_slide_opt(
  .x,
  .col_names,
  .f,
  ...,
  .window_size = NULL,
  .align = c("right", "center", "left"),
  .prefix = NULL,
  .suffix = NULL,
  .new_col_names = NULL,
  .ref_time_values = NULL,
  .all_rows = FALSE
)

epi_slide_mean(
  .x,
  .col_names,
  ...,
  .window_size = NULL,
  .align = c("right", "center", "left"),
  .prefix = NULL,
  .suffix = NULL,
  .new_col_names = NULL,
  .ref_time_values = NULL,
  .all_rows = FALSE
)

epi_slide_sum(
  .x,
  .col_names,
  ...,
  .window_size = NULL,
  .align = c("right", "center", "left"),
  .prefix = NULL,
  .suffix = NULL,
  .new_col_names = NULL,
  .ref_time_values = NULL,
  .all_rows = FALSE
)

Arguments

`.x`	An `epi_df` object. If ungrouped, we temporarily group by `geo_value` and any columns in `other_keys`. If grouped, we make sure the grouping is by `geo_value` and `other_keys`.
`.col_names`	<`tidy-select`> An unquoted column name (e.g., `cases`), multiple column names (e.g., `c(cases, deaths)`), other tidy-select expression, or a vector of characters (e.g. `c("cases", "deaths")`). Variable names can be used as if they were positions in the data frame, so expressions like `x:y` can be used to select a range of variables. The tidy-selection renaming interface is not supported, and cannot be used to provide output column names; if you want to customize the output column names, use `dplyr::rename` after the slide.
`.f`	Function; together with `...` specifies the computation to slide. `.f` must be one of `data.table`'s rolling functions (`frollmean`, `frollsum`, `frollapply`. See data.table::roll) or one of `slider`'s specialized sliding functions (`slide_mean`, `slide_sum`, etc. See slider::summary-slide). The optimized `data.table` and `slider` functions can't be directly passed as the computation function in `epi_slide` without careful handling to make sure each computation group is made up of the `.window_size` dates rather than `.window_size` points. `epi_slide_opt` (and wrapper functions `epi_slide_mean` and `epi_slide_sum`) take care of window completion automatically to prevent associated errors.
`...`	Additional arguments to pass to the slide computation `.f`, for example, `algo` or `na.rm` in data.table functions. You don't need to specify `.x`, `.window_size`, or `.align` (or `before`/`after` for slider functions).
`.window_size`	The size of the sliding window. The accepted values depend on the type of the `time_value` column in `.x`: if time type is `Date` and the cadence is daily, then `.window_size` can be an integer (which will be interpreted in units of days) or a difftime with units "days" if time type is `Date` and the cadence is weekly, then `.window_size` must be a `difftime` with units "weeks" if time type is a `yearmonth` or an integer, then `.window_size` must be an integer
`.align`	The alignment of the sliding window. If "right" (default), then the window has its end at the reference time. This is likely the most common use case, e.g. `.window_size=7` and `.align="right"` slides over the past week of data. If "left", then the window has its start at the reference time. If "center", then the window is centered at the reference time. If the window size is odd, then the window will have floor(window_size/2) points before and after the reference time; if the window size is even, then the window will be asymmetric and have one more value before the reference time than after.
`.prefix`	Optional `glue::glue` format string; name the slide result column(s) by attaching this prefix to the corresponding input column(s). Some shorthand is supported for basing the output names on `.window_size` or other arguments; see "Prefix and suffix shorthand" below.
`.suffix`	Optional `glue::glue` format string; like `.prefix`. The default naming behavior is equivalent to `.suffix = "_{.n}{.time_unit_abbr}{.align_abbr}{.f_abbr}"`. Can be used in combination with `.prefix`.
`.new_col_names`	Optional character vector with length matching the number of input columns from `.col_names`; name the slide result column(s) with these names. Cannot be used in combination with `.prefix` and/or `.suffix`.
`.ref_time_values`	The time values at which to compute the slides values. By default, this is all the unique time values in `.x`.
`.all_rows`	If `.all_rows = FALSE`, the default, then the output `epi_df` will have only the rows that had a `time_value` in `.ref_time_values`. Otherwise, all the rows from `.x` are included by with a missing value marker (typically NA, but more technically the result of `vctrs::vec_cast`-ing `NA` to the type of the slide computation output).

Value

An epi_df object with one or more new slide computation columns added. It will be ungrouped if .x was ungrouped, and have the same groups as .x if .x was grouped.

Prefix and suffix shorthand

glue::glue format strings specially interpret content within curly braces. E.g., glue::glue("ABC{2 + 2}") evaluates to "ABC4". For .prefix and .suffix, we provide glue with some additional variable bindings:

{.n} will be the number of time steps in the computation corresponding to the .window_size.
{.time_unit_abbr} will be a lower-case letter corresponding to the time_type of .x
{.align_abbr} will be "" if .align is the default of "right"; otherwise, it will be the first letter of .align
{.f_abbr} will be a character vector containing a short abbreviation for .f factoring in the input column type(s) for .col_names

Examples

library(dplyr)

# Add a column (`cases_7dsum`) containing a 7-day trailing sum on `cases`:
cases_deaths_subset %>%
  select(geo_value, time_value, cases) %>%
  epi_slide_sum(cases, .window_size = 7)

# Add a column (`cases_rate_7dav`) containing a 7-day trailing average on `case_rate`:
covid_case_death_rates_extended %>%
  epi_slide_mean(case_rate, .window_size = 7)

# Use a less common specialized slide function:
cases_deaths_subset %>%
  epi_slide_opt(cases, slider::slide_min, .window_size = 7)

# Specify output column names and/or a naming scheme:
cases_deaths_subset %>%
  select(geo_value, time_value, cases) %>%
  group_by(geo_value) %>%
  epi_slide_sum(cases, .window_size = 7, .new_col_names = "case_sum") %>%
  ungroup()
cases_deaths_subset %>%
  select(geo_value, time_value, cases) %>%
  group_by(geo_value) %>%
  epi_slide_sum(cases, .window_size = 7, .prefix = "sum_") %>%
  ungroup()

# Additional settings can be sent to the {data.table} and {slider} functions
# via `...`. This example passes some arguments to `frollmean` settings for
# speed, accuracy, and to allow partially-missing windows:
covid_case_death_rates_extended %>%
  epi_slide_mean(
    case_rate,
    .window_size = 7,
    na.rm = TRUE, algo = "exact", hasNA = TRUE
  )

# If the more specialized possibilities for `.f` don't cover your needs, you
# can use `epi_slide_opt` with `.f = data.table::frollapply` to apply a
# custom function at the cost of more computation time. See also `epi_slide`
# if you need something even more general.
cases_deaths_subset %>%
  select(geo_value, time_value, case_rate_7d_av, death_rate_7d_av) %>%
  epi_slide_opt(c(case_rate_7d_av, death_rate_7d_av),
    data.table::frollapply,
    FUN = median, .window_size = 28,
    .suffix = "_{.n}{.time_unit_abbr}_median"
  ) %>%
  print(n = 40)

cmu-delphi/epiprocess documentation built on April 12, 2025, 12:51 p.m.