Nothing
#' 'collapse' version of `dplyr::group_by()`
#'
#' @description
#' This works the exact same as `dplyr::group_by()` and typically
#' performs around the same speed but uses slightly less memory.
#'
#' @param data data frame.
#' @param ... Variables to group by.
#' @param .add Should groups be added to existing groups?
#' Default is `FALSE`.
#' @param .order Should groups be ordered? If `FALSE`
#' groups will be ordered based on first-appearance. \cr
#' Typically, setting order to `FALSE` is faster.
#' @param .by (Optional). A selection of columns to group by for this operation.
#' Columns are specified using `tidyselect`.
#' @param .cols (Optional) alternative to `...` that accepts
#' a named character vector or numeric vector.
#' If speed is an expensive resource, it is recommended to use this.
#' @param .drop Should unused factor levels be dropped? Default is `TRUE`.
#'
#'
#' @returns
#' `f_group_by()` returns a `grouped_df` that can be used
#' for further for grouped calculations.
#'
#' `group_ordered()` returns `TRUE` if the group data are sorted,
#' i.e if `attr(attr(data, "groups"), "ordered") == TRUE`. If sorted,
#' which is usually the default, this leads to summary calculations
#' like `f_summarise()` or `dplyr::summarise()` producing sorted groups.
#' If `FALSE` they are returned based on order-of-first appearance in the data.
#'
#' @details
#' `f_group_by()` works almost exactly like the 'dplyr' equivalent.
#' An attribute "ordered" (`TRUE` or `FALSE`) is added to the group data to
#' signify if the groups are sorted or not.
#'
#' ### Ordered vs Sorted
#'
#' The distinction between ordered and sorted is somewhat subtle.
#' Functions in fastplyr that use a `sort` argument generally refer
#' to the top-level dataset being sorted in some way, either by sorting
#' the group columns like in `f_expand()` or `f_distinct()`, or
#' some other columns, like the count column in `f_count()`.
#'
#' The `.order` argument, when set to `TRUE` (the default),
#' is used to mean that the group data will be calculated
#' using a sort-based algorithm, leading to sorted group data.
#' When `.order` is `FALSE`, the group data will be returned based on
#' the order-of-first appearance of the groups in the data.
#' This order-of-first appearance may still naturally be sorted
#' depending on the data.
#' For example, `group_id(1:3, order = T)` results in the same group IDs
#' as `group_id(1:3, order = F)` because 1, 2, and 3 appear in the data in
#' ascending sequence whereas `group_id(3:1, order = T)` does not equal
#' `group_id(3:1, order = F)`
#'
#'
#' Part of the reason for the distinction is that internally fastplyr
#' can in theory calculate group data
#' using the sort-based algorithm and still return unsorted groups,
#' though this combination is only available to the user in limited places like
#' `f_distinct(.order = TRUE, .sort = FALSE)`.
#'
#' The other reason is to prevent confusion in the meaning
#' of `sort` and `order` so that `order` always refers to the
#' algorithm specified, resulting in sorted groups, and `sort` implies a
#' physical sorting of the returned data. It's also worth mentioning that
#' in most functions, `sort` will implicitly utilise the sort-based algorithm
#' specified via `order = TRUE`.
#'
#'
#' ### Using the order-of-first appearance algorithm for speed
#'
#' In many situations (not all) it can be faster to use the
#' order-of-first appearance algorithm, specified via `.order = FALSE`.
#'
#' This can generally be accessed by first calling
#' `f_group_by(data, ..., .order = FALSE)` and then
#' performing your calculations.
#'
#' To utilise this algorithm more globally and package-wide,
#' set the '.fastplyr.order.groups' option to `FALSE` using the code:
#' `options(.fastplyr.order.groups = FALSE)`.
#'
#'
#'
#' @rdname f_group_by
#' @export
#'
f_group_by <- function(data, ..., .add = FALSE,
.order = group_by_order_default(data),
.by = NULL, .cols = NULL,
.drop = df_group_by_drop_default(data)){
init_group_vars <- group_vars(data)
group_info <- tidy_eval_groups(
cpp_ungroup(data), ...,
.by = {{ .by }},
.cols = .cols,
.order = .order,
return_order = .order
)
out <- group_info[[1L]]
GRP <- group_info[[2L]]
groups <- GRP_group_vars(GRP)
if (.add){
order_unchanged <- .order == group_by_order_default(data)
drop_unchanged <- .drop == df_group_by_drop_default(data)
no_extra_groups <- length(groups) == 0 || (length(vec_setdiff(groups, init_group_vars)) == 0)
if (order_unchanged && drop_unchanged && no_extra_groups){
return(data)
}
GRP <- df_to_GRP(out, c(init_group_vars, groups), order = .order)
}
construct_fastplyr_grouped_df(out, GRP, drop = .drop)
}
#' @rdname f_group_by
#' @export
group_ordered <- function(data){
attr(group_data(data), "ordered") %||% TRUE
}
#' @rdname f_group_by
#' @export
f_ungroup <- cpp_ungroup
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.