f_mutate: A faster 'mutate()' with per-group optimisations

View source: R/f_mutate.R

f_mutateR Documentation

A faster mutate() with per-group optimisations

Description

A faster mutate() with per-group optimisations

Usage

f_mutate(
  .data,
  ...,
  .by = NULL,
  .order = group_by_order_default(.data),
  .keep = "all"
)

Arguments

.data

A data frame.

...

Name-value pairs of summary functions. Expressions with across() are also accepted.

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

.order

Should the groups be returned in sorted order? If FALSE, this will return the groups in order of first appearance, and in many cases is faster.

.keep

Which columns to keep. Options are 'all', 'used', 'unused' and 'none'.

Value

A data frame with added columns.

Details

fastplyr data-masking functions like f_mutate and f_summarise operate very similarly to their dplyr counterparts but with some crucial differences. Optimisations for by-group operations kick in for common statistical functions which are detailed below. A message will be printed which one can disable by running options(fastplyr.inform = FALSE). When this happens, the expressions which become optimised no longer obey data-masking rules pertaining to sequential and dependent expression execution. For example, the pseudo code f_summarise(data, mean = mean(x), mean2 = round(mean), .by = g) when optimised will not work because the named col mean will not be visible in later expressions.

One can disable fastplyr optimisations globally by running options(fastplyr.optimise = F).

Optimised statistical functions

Some functions are internally optimised using 'collapse' fast statistical functions. This makes execution on many groups very fast.

For fast quantiles (percentiles) by group, see tidy_quantiles

List of currently optimised functions

dplyr::n -> <custom_expression>
dplyr::row_number -> <custom_expression> (only for f_mutate)
dplyr::cur_group -> <custom_expression>
dplyr::cur_group_id -> <custom_expression>
dplyr::cur_group_rows -> <custom_expression> (only for f_mutate)
dplyr::lag -> <custom_expression> (only for f_mutate)
dplyr::lead -> <custom_expression> (only for f_mutate)
base::sum -> collapse::fsum
base::prod -> collapse::fprod
base::min -> collapse::fmin
base::max -> collapse::fmax
stats::mean -> collapse::fmean
stats::median -> collapse::fmedian
stats::sd -> collapse::fsd
stats::var -> collapse::fvar
dplyr::first -> collapse::ffirst
dplyr::last -> collapse::flast
dplyr::n_distinct -> collapse::fndistinct


fastplyr documentation built on June 8, 2025, 11:18 a.m.