R/fast_aggregate.R

Defines functions fast_aggregate

Documented in fast_aggregate

#' Aggregate data much faster using dplyr
#'
#' This is a convenience wrapper for aggregating your data using \pkg{dplyr} functions that tend to be much faster than the
#' usual [stats::aggregate()] command. It is also easy to call from within a function. *This function is not exported.*
#'
#' @param data A `data.frame` that contains the data.
#' @param factors Character. A vector of factor names to aggregate data by.
#' @param dv Character. The dependent variable to aggregate. All variables in `data` that contain this character string
#'    will be aggregated separately.
#' @param fun Closure. The function used for aggregation.
#' @keywords internal

fast_aggregate <- function(data, factors, dv, fun, na.rm = TRUE) {
  # subset: this is a bit faster than subset.data.frame
  data <- data[, colnames(data) %in% c(factors, dv)]
  # the dplyr magic: this construct seems to be as fast as using pipes
  grouped <- dplyr::grouped_df(data, vars = factors, drop = TRUE)

  dv_ <- dplyr::sym(dv)
  args <- list(x = dplyr::quo(!!dv_), na.rm = na.rm)
  agg_data <- as.data.frame(dplyr::summarise(.data = grouped, temporary_dv_name = fun(!!!args)))
  # do this in base R to avoid using `:=`
  colnames(agg_data)[colnames(agg_data) == "temporary_dv_name"] <- dv

  # soft-deprecated in dplyr:
  # agg_data <- as.data.frame(dplyr::summarise_all(.tbl = grouped, .funs = dplyr::funs(fun(., na.rm = TRUE))))

  return(agg_data)
}

Try the papaja package in your browser

Any scripts or data that you put into this service are public.

papaja documentation built on Sept. 29, 2023, 9:07 a.m.