stat_summarise: Fast grouped statistical summary for data frames.

View source: R/stat_summarise.R

stat_summariseR Documentation

Fast grouped statistical summary for data frames.

Description

collapse and data.table are used for the calculations.

Usage

stat_summarise(
  data,
  ...,
  stat = .stat_fns[1:3],
  q_probs = NULL,
  na.rm = TRUE,
  sort = df_group_by_order_default(data),
  .count_name = NULL,
  .names = NULL,
  .by = NULL,
  .cols = NULL,
  inform_stats = TRUE,
  as_tbl = FALSE
)

.stat_fns

Arguments

data

A data frame.

...

Variables to apply the statistical functions to. Tidy data-masking applies.

stat

A character vector of statistical summaries to apply. This can be one or more of the following:
"n", "nmiss", "ndistinct", "min", "max", "mean", "first", "last", "sd", "var", "mode", "median", "sum", "prop_complete".

q_probs

(Optional) Quantile probabilities. If supplied, q_summarise() is called and added to the result.

na.rm

Should NA values be removed? Default is TRUE.

sort

Should groups be sorted? Default is TRUE.

.count_name

Name of count column, default is "n".

.names

An optional glue specification passed to stringr::glue(). If .names = NULL, then when there is 1 variable, the function name is used, i.e .names = "{.fn}", when there are multiple variables and 1 function, the variable names are used, i.e, .names = "{.col}" and in the case of multiple variables and functions. "{.col}_{.fn}" is used.

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

.cols

(Optional) alternative to ... that accepts a named character vector or numeric vector. If speed is an expensive resource, it is recommended to use this.

inform_stats

Should available stat functions be displayed at the start of each session? Default is TRUE.

as_tbl

Should the result be a tibble? Default is FALSE.

Format

.stat_fns

An object of class character of length 14.

Details

stat_summarise() can apply multiple functions to multiple variables.

stat_summarise() is equivalent to
data %>% group_by(...) %>% summarise(across(..., list(...)))
but is faster and more efficient and accepts limited statistical functions.

Value

A summary data.table containing the summary values for each group.

See Also

q_summarise

Examples

library(timeplyr)
library(dplyr)

stat_df <- iris %>%
  stat_summarise(Sepal.Length, .by = Species)
# Join quantile info too
q_df <- iris %>%
  q_summarise(Sepal.Length, .by = Species)
summary_df <- left_join(stat_df, q_df, by = "Species")
summary_df

# Multiple cols
iris %>%
  group_by(Species) %>%
  stat_summarise(across(contains("Width")),
            stat = c("min", "max", "mean", "sd"))


timeplyr documentation built on Sept. 12, 2024, 7:37 a.m.