f_summarise | R Documentation |
Like dplyr::summarise()
but with some internal optimisations
for common statistical functions.
f_summarise(.data, ..., .by = NULL, .order = group_by_order_default(.data))
f_summarize(.data, ..., .by = NULL, .order = group_by_order_default(.data))
.data |
A data frame. |
... |
Name-value pairs of summary functions. Expressions with
|
.by |
(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select. |
.order |
Should the groups be returned in sorted order?
If |
An un-grouped data frame of summaries by group.
fastplyr data-masking functions like f_mutate
and f_summarise
operate
very similarly to their dplyr counterparts but with some crucial
differences.
Optimisations for by-group operations kick in for
common statistical functions which are detailed below.
A message will be printed which one can disable
by running options(fastplyr.inform = FALSE)
.
When this happens, the expressions which become optimised no longer
obey data-masking rules pertaining to sequential and dependent expression
execution.
For example,
the pseudo code
f_summarise(data, mean = mean(x), mean2 = round(mean), .by = g)
when optimised will not work because the named col mean
will not be visible
in later expressions.
One can disable fastplyr optimisations
globally by running options(fastplyr.optimise = F)
.
Some functions are internally optimised using 'collapse' fast statistical functions. This makes execution on many groups very fast.
For fast quantiles (percentiles) by group, see tidy_quantiles
List of currently optimised functions
dplyr::n
-> <custom_expression>
dplyr::row_number
-> <custom_expression> (only for f_mutate
)
dplyr::cur_group
-> <custom_expression>
dplyr::cur_group_id
-> <custom_expression>
dplyr::cur_group_rows
-> <custom_expression> (only for f_mutate
)
dplyr::lag
-> <custom_expression> (only for f_mutate
)
dplyr::lead
-> <custom_expression> (only for f_mutate
)
base::sum
-> collapse::fsum
base::prod
-> collapse::fprod
base::min
-> collapse::fmin
base::max
-> collapse::fmax
stats::mean
-> collapse::fmean
stats::median
-> collapse::fmedian
stats::sd
-> collapse::fsd
stats::var
-> collapse::fvar
dplyr::first
-> collapse::ffirst
dplyr::last
-> collapse::flast
dplyr::n_distinct
-> collapse::fndistinct
tidy_quantiles
library(fastplyr)
library(nycflights13)
library(dplyr)
options(fastplyr.inform = FALSE)
# Number of flights per month, including first and last day
flights |>
f_group_by(year, month) |>
f_summarise(first_day = first(day),
last_day = last(day),
num_flights = n())
## Fast mean summary using `across()`
flights |>
f_summarise(
across(where(is.numeric), mean),
.by = tailnum
)
flights |>
f_group_by(.cols = "tailnum") |>
f_summarise(
across(where(is.numeric), mean)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.