Nothing
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(rollup))
rollup
: A Tidy implementation of GROUPING SETS
, WITH ROLLUP
, and WITH CUBE
, which are powerful extensions of the GROUP BY
clause that compute multiple group-by clauses in a single statement in SQL
. This package operates on top of the dplyr
and performs the same functions as SQL
.
# From CRAN install.packages("rollup") # From Github library(devtools) devtools::install_github("JuYoungAhn/rollup")
rollup
package allow you to simplify multiple group_by
operations into a single, concise statement. mtcars %>% group_by(vs, am) %>% grouping_sets("vs","am",c("vs","am"), NA) %>% summarize(n=n(), avg_mpg=mean(mpg)) mtcars %>% group_by(vs, am) %>% with_rollup() %>% summarize(n=n(), avg_mpg=mean(mpg)) mtcars %>% group_by(vs, am) %>% with_cube() %>% summarize(n=n(), avg_mpg=mean(mpg))
library(dplyr) library(rollup) data("web_service_data") # web_service_data of rollup package web_service_data %>% head
grouping_sets()
allows you to perform multiple group_by
operations simultaneously, producing combined results in a single output.grouping_sets('a')
is equivalent to the single grouping set operation group_by(a)
.grouping_sets('a','b')
is equivalent to row binding of group_by(a)
and group_by(b)
.grouping_sets(c('a','b'),'a','b', NA)
is equivalent to row binding of group_by(a,b)
, group_by(a)
, group_by(b)
and without group_by
operation. library(tidyr) # compute average of `page_view_cnt` group by "gender", "age", and "gender & age", along with the overall average. NA in the output table represents overall aggregates. web_service_data %>% filter(date_id == '2024-06-30' & gender != "N") %>% group_by(gender, age) %>% grouping_sets('gender', 'age', c('gender','age'), NA) %>% summarize(avg_pv_cnt = mean(page_view_cnt)) # compute average of `page_view_cnt` group by "gender & age & product_view_cnt_cat" along with the marginal average with regard to "product_view_cnt_cat". web_service_data %>% filter(date_id == '2024-06-30' & gender != "N") %>% group_by(gender, age, product_view_cnt_cat) %>% grouping_sets('product_view_cnt_cat', c('product_view_cnt_cat', 'gender','age')) %>% summarize(avg_pv_cnt = mean(page_view_cnt)) %>% pivot_wider(names_from = product_view_cnt_cat, values_from = avg_pv_cnt)
with_cube()
automatically generates all possible combinations of specified variables in group_by
clause. with_cube()
function is a simplified way of expressing grouping_sets()
. with_cube()
is equivalent to using grouping_sets()
with all combinations of the specified columns.group_by(a,b,c)
followed by with_cube()
equals to grouping_sets(c('a','b','c'), c('a','b'), c('a','c'), c('b','c'), 'a', 'b', 'c', NA)
.with_cube()
is particularly useful when you want to include total aggregates of both rows and columns in a cross table.# This produces a table with average page view counts grouped by gender and age, including total aggregates across all combinations. web_service_data %>% filter(date_id == '2024-06-30' & gender != "N") %>% group_by(gender, age) %>% with_cube() %>% summarize(avg_pv_cnt = mean(page_view_cnt)) %>% pivot_wider(names_from = age, values_from = avg_pv_cnt)
with_rollup()
creates hierarchical aggregations by progressively reducing the number of grouping variables.with_rollup()
is particulary useful when variables have a hierarchy, because all possible combinations are not necessary.group_by(a,b)
followed by with_rollup()
equals to grouping_sets(c('a','b'), 'a', NA)
.group_by(a,b,c)
followed by with_rollup()
equals to grouping_sets(c('a','b','c'), ('a','b'), ('a'), NA)
.# The variables "age_big" and "age" have a hierarchy. web_service_data_processed <- web_service_data %>% mutate( age_big = case_when( age %in% c(10,20,30) ~ 'young', age %in% c(40,50,60) ~ 'old' ) ) # If there are aggregates "age_big & age", marginal aggregates for "age" are not necessary. # The following code computes aggregates for "age_big & age", "age_big", and entire data set. web_service_data_processed %>% group_by(age_big, age) %>% with_rollup() %>% summarize( user_cnt = n_distinct(id), avg_pv_cnt = mean(page_view_cnt) )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.