aggregate_multiple_fun: Wrapper to 'aggregate'
In SSBtools: Algorithms and Tools for Tabular Statistics and Hierarchical Computations

aggregate_multiple_fun

R Documentation

Wrapper to `aggregate`

Description

Wrapper to aggregate that allows multiple functions and functions of several variables

Usage

aggregate_multiple_fun(
  data,
  by,
  vars,
  fun = NULL,
  ind = NULL,
  ...,
  name_sep = "_",
  seve_sep = ":",
  multi_sep = ",",
  forward_dots = FALSE,
  dots2dots = FALSE,
  do_unmatrix = TRUE,
  do_unlist = TRUE,
  inc_progress = FALSE
)

Arguments

`data`	A data frame containing data to be aggregated
`by`	A data frame defining grouping
`vars`	A named vector or list of variable names in `data`. The elements are named by the names of `fun`. All the pairs of variable names and function names thus define all the result variables to be generated. Parameter `vars` will converted to an internal standard by the function `fix_vars_amf`. Thus, function names and also output variable names can be coded in different ways. Multiple output variable names can be coded using `multi_sep`. See examples and examples in `fix_vars_amf`. Indices instead of variable names are allowed. Omission of (some) names is possible since names can be omitted for one function (see `fun` below). A special possible feature is the combination of a single unnamed variable and all functions named. In this case, all functions are run and output variable names will be identical to the function names.
`fun`	A named list of functions. These names will be used as suffixes in output variable names. Name can be omitted for one function. A vector of function as strings is also possible. When unnamed, these function names will be used directly. See the examples of `fix_fun_amf`, which is the function used to convert `fun`. Without specifying `fun`, the functions, as strings, are taken from the function names coded in `vars`.
`ind`	When non-NULL, a data frame of indices. When NULL, this variable will be generated internally as `data.frame(ind = seq_len(nrow(data)))`. The parameter is useful for advanced use involving model/dummy matrices. For special use (`dummy = FALSE` in `dummy_aggregate`) `ind` can also be a two-column data frame.
`...`	Further arguments passed to `aggregate` and, depending on `forward_dots`/`dots2dots`, forwarded to the functions in `fun` (see details).
`name_sep`	A character string used when output variable names are generated.
`seve_sep`	A character string used when output variable names are generated from functions of several variables.
`multi_sep`	A character string used when multiple output variable names are sent as input.
`forward_dots`	Logical vector (possibly recycled) for each element of `fun` that determines whether `...` should be forwarded (see details).
`dots2dots`	Logical vector (possibly recycled) specifying the behavior when `forward_dots = TRUE` (see details).
`do_unmatrix`	By default (`TRUE`), the implementation uses `unmatrix` before returning output. For special use this can be omitted (`FALSE`).
`do_unlist`	By default (`TRUE`), the implementation uses `unlist` to combine output from multiple functions. For special use this can be omitted (`FALSE`).
`inc_progress`	logigal, `NULL` (same as `FALSE`) or a progress indicator function taking two parameters (i and n). `TRUE` means the same as `inc_default`. Note that this feature is implemented in a hacky manner as internal/hidden variables are grabbed from `aggregate`.

Details

One intention of aggregate_multiple_fun is to be a true generalization of aggregate. However, when many functions are involved, passing extra parameters can easily lead to errors. Therefore forward_dots and dots2dots are set to FALSE by default. When forward_dots = TRUE and dots2dots = FALSE, parameters will be forwarded, but only parameters that are explicitly defined in the specific fun function. For the sum function, this means that a possible na.rm parameter is forwarded but not others. When forward_dots = TRUE and dots2dots = TRUE, other parameters will also be forwarded to fun functions where ... is included. For the sum function, this means that such extra parameters will, probably erroneously, be included in the summation (see examples).

For the function to work with dummy_aggregate, the data is subject to unlist before the fun functions are called. This does not apply in the special case where ind is a two-column data frame. Then, in the case of list data, the fun functions have to handle this themselves.

A limitation when default output, when do_unlist = TRUE, is that variables in output are forced to have the same class. This is caused by the unlist function being run on the output. This means, for example, that all the variables will become numeric when they should have been both integer and numeric.

Value

A data frame

Examples

d2 <- SSBtoolsData("d2")
set.seed(12)
d2$y <- round(rnorm(nrow(d2)), 2)
d <- d2[sample.int(nrow(d2), size = 20), ]
aggregate_multiple_fun(
   data = d, 
   by = d[c("k_group", "main_income")], 
   vars = c("freq", "y", median = "freq", median = "y", e1 = "freq"),
   fun = c(sum, median = median, e1 = function(x) x[1])  
)

# With functions as named strings 
aggregate_multiple_fun(
   data = d, 
   by = d[c("k_group", "main_income")], 
   vars = c(sum = "y", med = "freq", med = "y"),
   fun = c(sum = "sum", med = "median")
)

# Without specifying functions 
# - equivalent to `fun = c("sum", "median")` 
aggregate_multiple_fun(
   data = d, 
   by = d[c("k_group", "main_income")], 
   vars = c(sum = "y", median = "freq", median = "y")
)

# The single unnamed variable feature. Also functions as strings. 
aggregate_multiple_fun(
   data = d, 
   by = d[c("k_group", "main_income")], 
   vars = "y",
   fun = c("sum", "median", "min", "max")
) 

# with multiple outputs (function my_range)
# and with function of two variables (weighted.mean(y, freq))
my_range <- function(x) c(min = min(x), max = max(x))
aggregate_multiple_fun(
   data = d, 
   by = d[c("k_group", "main_income")], 
   vars = list("freq", "y", ra = "freq", wmean  = c("y", "freq")),
   fun = c(sum, ra = my_range, wmean = weighted.mean)
)

# with specified output variable names
my_range <- function(x) c(min = min(x), max = max(x))
aggregate_multiple_fun(
   data = d, 
   by = d[c("k_group", "main_income")], 
   vars = list("freq", "y", 
               `freqmin,freqmax` = list(ra = "freq"), 
                yWmean  = list(wmean  = c("y", "freq"))),
   fun = c(sum, ra = my_range, wmean = weighted.mean)
)


# To illustrate forward_dots and dots2dots
q <- d[1, ]
q$w <- 100 * rnorm(1)
for (dots2dots in c(FALSE, TRUE)) for (forward_dots in c(FALSE, TRUE)) {
  cat("\n=======================================\n")
  cat("forward_dots =", forward_dots, ", dots2dots =", dots2dots)
  out <- aggregate_multiple_fun(
    data = q, by = q["k_group"], 
    vars = c(sum = "freq", round = "w"), fun = c("sum", "round"),  
    digits = 3, forward_dots = forward_dots, dots2dots = dots2dots)
  cat("\n")
  print(out)
}
# In last case digits forwarded to sum (as ...) 
# and wrongly included in the summation

SSBtools documentation built on June 19, 2025, 5:07 p.m.

SSBtools index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SSBtools
Algorithms and Tools for Tabular Statistics and Hierarchical Computations

aggregate_multiple_fun: Wrapper to 'aggregate'
In SSBtools: Algorithms and Tools for Tabular Statistics and Hierarchical Computations

Wrapper to `aggregate`

Description

Usage

Arguments

Details

Value

Examples

Related to aggregate_multiple_fun in SSBtools...

R Package Documentation

Browse R Packages

We want your feedback!

SSBtools Algorithms and Tools for Tabular Statistics and Hierarchical Computations

aggregate_multiple_fun: Wrapper to 'aggregate' In SSBtools: Algorithms and Tools for Tabular Statistics and Hierarchical Computations

Wrapper to aggregate

Description

Usage

Arguments

Details

Value

Examples

Related to aggregate_multiple_fun in SSBtools...

R Package Documentation

Browse R Packages

We want your feedback!

SSBtools
Algorithms and Tools for Tabular Statistics and Hierarchical Computations

aggregate_multiple_fun: Wrapper to 'aggregate'
In SSBtools: Algorithms and Tools for Tabular Statistics and Hierarchical Computations

Wrapper to `aggregate`