group_by.epi_archive: 'group_by' and related methods for 'epi_archive',...

View source: R/archive.R

group_by.epi_archiveR Documentation

group_by and related methods for epi_archive, grouped_epi_archive

Description

group_by and related methods for epi_archive, grouped_epi_archive

Usage

## S3 method for class 'epi_archive'
group_by(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data))

## S3 method for class 'grouped_epi_archive'
group_by(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data))

## S3 method for class 'grouped_epi_archive'
group_by_drop_default(.tbl)

## S3 method for class 'grouped_epi_archive'
group_vars(x)

## S3 method for class 'grouped_epi_archive'
groups(x)

## S3 method for class 'grouped_epi_archive'
ungroup(x, ...)

is_grouped_epi_archive(x)

Arguments

.data

An epi_archive or grouped_epi_archive

...

Similar to dplyr::group_by (see "Details:" for edge cases);

  • For group_by: unquoted variable name(s) or other "data masking" expression(s). It's possible to use dplyr::mutate-like syntax here to calculate new columns on which to perform grouping, but note that, if you are regrouping an already-grouped .data object, the calculations will be carried out ignoring such grouping (same as in dplyr).

  • For ungroup: either

    • empty, in order to remove the grouping and output an epi_archive; or

    • variable name(s) or other "tidy-select" expression(s), in order to remove the matching variables from the list of grouping variables, and output another grouped_epi_archive.

.add

Boolean. If FALSE, the default, the output will be grouped by the variable selection from ... only; if TRUE, the output will be grouped by the current grouping variables plus the variable selection from ....

.drop

As described in dplyr::group_by; determines treatment of factor columns.

.tbl

A grouped_epi_archive object.

x

For groups, group_vars, or ungroup: a grouped_epi_archive; for is_grouped_epi_archive: any object

Details

To match dplyr, group_by allows "data masking" (also referred to as "tidy evaluation") expressions ..., not just column names, in a way similar to mutate. Note that replacing or removing key columns with these expressions is disabled.

archive %>% group_by() and other expressions that group or regroup by zero columns (indicating that all rows should be treated as part of one large group) will output a grouped_epi_archive, in order to enable the use of grouped_epi_archive methods on the result. This is in slight contrast to the same operations on tibbles and grouped tibbles, which will not output a grouped_df in these circumstances.

Using group_by with .add=FALSE to override the existing grouping is disabled; instead, ungroup first then group_by.

group_by_drop_default on (ungrouped) epi_archives is expected to dispatch to group_by_drop_default.default (but there is a dedicated method for grouped_epi_archives).

Examples


grouped_archive <- archive_cases_dv_subset %>% group_by(geo_value)

# `print` for metadata and method listing:
grouped_archive %>% print()

# The primary use for grouping is to perform a grouped `epix_slide`:

archive_cases_dv_subset %>%
  group_by(geo_value) %>%
  epix_slide(
    .f = ~ mean(.x$case_rate_7d_av),
    .before = 2,
    .versions = as.Date("2020-06-11") + 0:2,
    .new_col_name = "case_rate_3d_av"
  ) %>%
  ungroup()

# -----------------------------------------------------------------

# Advanced: some other features of dplyr grouping are implemented:

library(dplyr)
toy_archive <-
  tribble(
    ~geo_value, ~age_group, ~time_value, ~version, ~value,
    "us", "adult", "2000-01-01", "2000-01-02", 121,
    "us", "pediatric", "2000-01-02", "2000-01-03", 5, # (addition)
    "us", "adult", "2000-01-01", "2000-01-03", 125, # (revision)
    "us", "adult", "2000-01-02", "2000-01-03", 130 # (addition)
  ) %>%
  mutate(
    age_group = ordered(age_group, c("pediatric", "adult")),
    time_value = as.Date(time_value),
    version = as.Date(version)
  ) %>%
  as_epi_archive(other_keys = "age_group")

# The following are equivalent:
toy_archive %>% group_by(geo_value, age_group)
toy_archive %>%
  group_by(geo_value) %>%
  group_by(age_group, .add = TRUE)
grouping_cols <- c("geo_value", "age_group")
toy_archive %>% group_by(across(all_of(grouping_cols)))

# And these are equivalent:
toy_archive %>% group_by(geo_value)
toy_archive %>%
  group_by(geo_value, age_group) %>%
  ungroup(age_group)

# To get the grouping variable names as a character vector:
toy_archive %>%
  group_by(geo_value) %>%
  group_vars()

# To get the grouping variable names as a `list` of `name`s (a.k.a. symbols):
toy_archive %>%
  group_by(geo_value) %>%
  groups()

toy_archive %>%
  group_by(geo_value, age_group, .drop = FALSE) %>%
  epix_slide(.f = ~ sum(.x$value), .before = 20) %>%
  ungroup()


cmu-delphi/epiprocess documentation built on Oct. 29, 2024, 5:37 p.m.