f_summarise: Summarise each group down to one row

View source: R/f_summarise.R

f_summariseR Documentation

Summarise each group down to one row

Description

Like dplyr::summarise() but with some internal optimisations for common statistical functions.

Usage

f_summarise(.data, ..., .by = NULL, .order = group_by_order_default(.data))

f_summarize(.data, ..., .by = NULL, .order = group_by_order_default(.data))

Arguments

.data

A data frame.

...

Name-value pairs of summary functions. Expressions with across() are also accepted.

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

.order

Should the groups be returned in sorted order? If FALSE, this will return the groups in order of first appearance, and in many cases is faster.

Value

An un-grouped data frame of summaries by group.

Details

fastplyr data-masking functions like f_mutate and f_summarise operate very similarly to their dplyr counterparts but with some crucial differences. Optimisations for by-group operations kick in for common statistical functions which are detailed below. A message will be printed which one can disable by running options(fastplyr.inform = FALSE). When this happens, the expressions which become optimised no longer obey data-masking rules pertaining to sequential and dependent expression execution. For example, the pseudo code f_summarise(data, mean = mean(x), mean2 = round(mean), .by = g) when optimised will not work because the named col mean will not be visible in later expressions.

One can disable fastplyr optimisations globally by running options(fastplyr.optimise = F).

Optimised statistical functions

Some functions are internally optimised using 'collapse' fast statistical functions. This makes execution on many groups very fast.

For fast quantiles (percentiles) by group, see tidy_quantiles

List of currently optimised functions

dplyr::n -> <custom_expression>
dplyr::row_number -> <custom_expression> (only for f_mutate)
dplyr::cur_group -> <custom_expression>
dplyr::cur_group_id -> <custom_expression>
dplyr::cur_group_rows -> <custom_expression> (only for f_mutate)
dplyr::lag -> <custom_expression> (only for f_mutate)
dplyr::lead -> <custom_expression> (only for f_mutate)
base::sum -> collapse::fsum
base::prod -> collapse::fprod
base::min -> collapse::fmin
base::max -> collapse::fmax
stats::mean -> collapse::fmean
stats::median -> collapse::fmedian
stats::sd -> collapse::fsd
stats::var -> collapse::fvar
dplyr::first -> collapse::ffirst
dplyr::last -> collapse::flast
dplyr::n_distinct -> collapse::fndistinct

See Also

tidy_quantiles

Examples

library(fastplyr)
library(nycflights13)
library(dplyr)
options(fastplyr.inform = FALSE)
# Number of flights per month, including first and last day
flights |>
  f_group_by(year, month) |>
  f_summarise(first_day = first(day),
              last_day = last(day),
              num_flights = n())

## Fast mean summary using `across()`

flights |>
  f_summarise(
    across(where(is.numeric), mean),
    .by = tailnum
  )

flights |>
  f_group_by(.cols = "tailnum") |>
  f_summarise(
    across(where(is.numeric), mean)
  )

fastplyr documentation built on June 8, 2025, 11:18 a.m.