accumulate | R Documentation |
Compute grouped aggregates. If a group does not satisfy certain user-defined
conditions (such as too many missings, or not enough records) then the group
is expanded according to a user-defined 'collapsing' scheme. This happens
recursively until either the group satisfies all conditions and the
aggregate is computed, or we run out of collapsing possibilities and the
NA
is returned for that group.
accumulate
aggregates over all non-grouping variables defined in
collapse
cumulate
uses a syntax akin to dplyr::summarise
accumulate(data, collapse, test, fun, ...)
cumulate(data, collapse, test, ...)
data |
|
collapse |
|
test |
|
fun |
|
... |
For |
A data frame where each row represents a (multivariate) group. The first
columns contain the grouping variables. The next column is called
level
and indicates to what level collapsing was necessary to compute
a value, where 0 means that no collapsing was necessary. The following
colummns contain the aggregates defined in the ...
argument. If no
amount of collapsing yields a data set that is satisfactory according to
test
, then for that row, the level
and subsequent columns are
NA
.
If all combinations of collapsing options are stored as columns in
data
, the formula
interface can be used. An example is the
easiest way to see how it works. Suppose that collapse = A*B ~ A1*B +
B
This means:
Compute output for groups defined by variables A and B
If for a certain combination (a,b)
in AxB
the data does not
pass the test
, use (a1,b)
in A1xB
as alternative combination to compute
a value for (a,b)
(A1xB
must yield larger groups than AxB
).
If that does not work, use only B
as a grouping variable to compute
a value for (a,b)
.
If that does not work, return NA
for that particular combination (a,b)
.
Generally, the formula
must be of the form X0 ~ X1 + X2 + ... +
Xn
where each Xi
is a (product of) grouping variable(s) in the data set.
In this case collapse
is a data frame with columns [A0, A1,
..., An]
. The variable A0
represents the most fine-grained
grouping and must also be present in data
. Aggregation works
as follows.
Compute output for groups defined by variable A0
If for a certain a0
in A0
the corresponding selected
data does not pass the test
, use the larger dataset corresponding to
a1
in A1
to compute output for a1
.
Repeat the second step until either the test
is passed or
no more collapsing is possible. In the latter case, return NA
for that particular value of a0
.
MPJ van der Loo (2025) Split-Apply-Combine with Dynamic Grouping
Journal of Statistical Software doi:10.18637/jss.v112.i04
.
## Example of data frame defining collapsing scheme, using accumulate
input <- data.frame(Y1 = 2^(0:8), Y2 = 2^(0:8))
input$Y2[c(1,4,7)] <- NA
# make sure that the input data also has the most fine-graind (target)
# grouping variable
input$A0 <- c(123,123,123,135,136,137,212,213,225)
# define collapsing sequence
collapse <- data.frame(
A0 = c(123, 135, 136, 137, 212, 213, 225)
, A1 = c(12 , 13 , 13 , 13 , 21 , 21 , 22 )
, A2 = c(1 , 1 , 1 , 1 , 2 , 2 , 2 )
)
accumulate(input
, collapse
, test = function(d) nrow(d)>=3
, fun = sum, na.rm=TRUE)
## Example of formula defining collapsing scheme, using cumulate
input <- data.frame(
A = c(1,1,1,2,2,2,3,3,3)
, B = c(11,11,11,12,12,13,21,22,12)
, B1 = c(1,1,1,1,1,1,2,2,1)
, Y = 2^(0:8)
)
cumulate(input, collapse=A*B ~ A*B1 + A
, test = function(d) nrow(d) >= 3
, tY = sum(Y))
## Example with formula defining collapsing scheme, using accumulate
# The collapsing scheme must be represented by variables in the
# data. All columns not part of the collapsing scheme will be aggregated
# over.
input <- data.frame(
A = c(1,1,1,2,2,2,3,3,3)
, B = c(11,11,11,12,12,13,21,22,12)
, B1 = c(1,1,1,1,1,1,2,2,1)
, Y1 = 2^(0:8)
, Y2 = 2^(0:8)
)
input$Y2[c(1,4,7)] <- NA
accumulate(input
, collapse = A*B ~ A*B1 + A
, test=function(a) nrow(a)>=3
, fun = sum, na.rm=TRUE)
## Example with data.frame defining collapsing scheme, using cumulate
dat <- data.frame(A0 = c("11","12","11","22"), Y = c(2,4,6,8))
# collapsing scheme
csh <- data.frame(
A0 = c("11","12","22")
, A1 = c("1" ,"1", "2")
)
cumulate(data = dat
, collapse = csh
, test = function(d) if (nrow(d)<2) FALSE else TRUE
, mn = mean(Y, na.rm=TRUE)
, md = median(Y, na.rm=TRUE)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.