View source: R/aggregate_multiple_fun.R
aggregate_multiple_fun | R Documentation |
aggregate
Wrapper to aggregate
that allows multiple functions and functions of several variables
aggregate_multiple_fun(
data,
by,
vars,
fun = NULL,
ind = NULL,
...,
name_sep = "_",
seve_sep = ":",
multi_sep = ",",
forward_dots = FALSE,
dots2dots = FALSE,
do_unmatrix = TRUE,
do_unlist = TRUE,
inc_progress = FALSE
)
data |
A data frame containing data to be aggregated |
by |
A data frame defining grouping |
vars |
A named vector or list of variable names in
|
fun |
A named list of functions. These names will be used as suffixes in output variable names. Name can be omitted for one function.
A vector of function as strings is also possible. When unnamed, these function names will be used directly.
See the examples of |
ind |
When non-NULL, a data frame of indices.
When NULL, this variable will be generated internally as |
... |
Further arguments passed to |
name_sep |
A character string used when output variable names are generated. |
seve_sep |
A character string used when output variable names are generated from functions of several variables. |
multi_sep |
A character string used when multiple output variable names are sent as input. |
forward_dots |
Logical vector (possibly recycled) for each element of |
dots2dots |
Logical vector (possibly recycled) specifying the behavior when |
do_unmatrix |
By default ( |
do_unlist |
By default ( |
inc_progress |
logigal, |
One intention of aggregate_multiple_fun
is to be a true generalization of aggregate
.
However, when many functions are involved, passing extra parameters can easily lead to errors.
Therefore forward_dots
and dots2dots
are set to FALSE
by default.
When forward_dots = TRUE
and dots2dots = FALSE
, parameters will be forwarded,
but only parameters that are explicitly defined in the specific fun
function.
For the sum
function, this means that a possible na.rm
parameter is forwarded but not others.
When forward_dots = TRUE
and dots2dots = TRUE
, other parameters will also be forwarded to fun
functions where ...
is included.
For the sum
function, this means that such extra parameters will, probably erroneously, be included in the summation (see examples).
For the function to work with dummy_aggregate
,
the data is subject to unlist
before the fun
functions are called.
This does not apply in the special case where ind
is a two-column data frame.
Then, in the case of list data, the fun
functions have to handle this themselves.
A limitation when default output, when do_unlist = TRUE
, is that variables in output are forced to have the same class.
This is caused by the unlist
function being run on the output. This means, for example,
that all the variables will become numeric when they should have been both integer and numeric.
A data frame
d2 <- SSBtoolsData("d2")
set.seed(12)
d2$y <- round(rnorm(nrow(d2)), 2)
d <- d2[sample.int(nrow(d2), size = 20), ]
aggregate_multiple_fun(
data = d,
by = d[c("k_group", "main_income")],
vars = c("freq", "y", median = "freq", median = "y", e1 = "freq"),
fun = c(sum, median = median, e1 = function(x) x[1])
)
# With functions as named strings
aggregate_multiple_fun(
data = d,
by = d[c("k_group", "main_income")],
vars = c(sum = "y", med = "freq", med = "y"),
fun = c(sum = "sum", med = "median")
)
# Without specifying functions
# - equivalent to `fun = c("sum", "median")`
aggregate_multiple_fun(
data = d,
by = d[c("k_group", "main_income")],
vars = c(sum = "y", median = "freq", median = "y")
)
# The single unnamed variable feature. Also functions as strings.
aggregate_multiple_fun(
data = d,
by = d[c("k_group", "main_income")],
vars = "y",
fun = c("sum", "median", "min", "max")
)
# with multiple outputs (function my_range)
# and with function of two variables (weighted.mean(y, freq))
my_range <- function(x) c(min = min(x), max = max(x))
aggregate_multiple_fun(
data = d,
by = d[c("k_group", "main_income")],
vars = list("freq", "y", ra = "freq", wmean = c("y", "freq")),
fun = c(sum, ra = my_range, wmean = weighted.mean)
)
# with specified output variable names
my_range <- function(x) c(min = min(x), max = max(x))
aggregate_multiple_fun(
data = d,
by = d[c("k_group", "main_income")],
vars = list("freq", "y",
`freqmin,freqmax` = list(ra = "freq"),
yWmean = list(wmean = c("y", "freq"))),
fun = c(sum, ra = my_range, wmean = weighted.mean)
)
# To illustrate forward_dots and dots2dots
q <- d[1, ]
q$w <- 100 * rnorm(1)
for (dots2dots in c(FALSE, TRUE)) for (forward_dots in c(FALSE, TRUE)) {
cat("\n=======================================\n")
cat("forward_dots =", forward_dots, ", dots2dots =", dots2dots)
out <- aggregate_multiple_fun(
data = q, by = q["k_group"],
vars = c(sum = "freq", round = "w"), fun = c("sum", "round"),
digits = 3, forward_dots = forward_dots, dots2dots = dots2dots)
cat("\n")
print(out)
}
# In last case digits forwarded to sum (as ...)
# and wrongly included in the summation
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.