group_by: Group by variable(s) and implement operations

group_by_dtR Documentation

Group by variable(s) and implement operations

Description

Carry out data manipulation within specified groups. Different from group_dt, the implementation is split into two operations, namely grouping and implementation.

Using setkey and setkeyv in data.table to carry out group_by-like functionalities in dplyr. This is not only convenient but also efficient in computation.

Usage

group_by_dt(.data, ..., cols = NULL)

group_exe_dt(.data, ...)

Arguments

.data

A data frame

...

Variables to group by for group_by_dt, namely the columns to sort by. Do not quote the column names. Any data manipulation arguments that could be implemented on a data.frame for group_exe_dt. It can receive what select_dt receives.

cols

A character vector of column names to group by.

Details

group_by_dt and group_exe_dt are a pair of functions to be used in combination. It utilizes the feature of key setting in data.table, which provides high performance for group operations, especially when you have to operate by specific groups frequently.

Value

A data.table with keys

Examples


# aggregation after grouping using group_exe_dt
as.data.table(iris) -> a
a %>%
  group_by_dt(Species) %>%
  group_exe_dt(head(1))

a %>%
  group_by_dt(Species) %>%
  group_exe_dt(
    head(3) %>%
      summarise_dt(sum = sum(Sepal.Length))
  )

mtcars %>%
  group_by_dt("cyl|am") %>%
  group_exe_dt(
    summarise_dt(mpg_sum = sum(mpg))
  )
# equals to
mtcars %>%
  group_by_dt(cols = c("cyl","am")) %>%
  group_exe_dt(
    summarise_dt(mpg_sum = sum(mpg))
  )

tidyfst documentation built on Sept. 16, 2024, 9:06 a.m.