cross_by_dimensions: Cross by dimensions

Description Usage Arguments See Also Examples

View source: R/cross-dimensions.R

Description

This function stacks an extra copy of the table for each dimension column specified as an argument, replaces the value of the column with the word "All", and finally groups by all the columns. It acts as an extended group_by that allows complete summaries across each individual dimension and possible combinations. It works both in-database and in-memory.

Usage

1
2
3
4
5
6
7
cross_by_dimensions(
  tbl,
  ...,
  add = TRUE,
  max_dimensions = NULL,
  collect_fun = NULL
)

Arguments

tbl

A table

...

A selection of columns

add

Whether to leave the existing groups as well instead of replacing them (by default, yes).

max_dimensions

The number of (non-All) dimensions that each row can have. This reduces the size of a metrics table, by limiting the number of dimensions that can be anything besides All at the same time.

collect_fun

A function to collect or materialize intermediate tables. This is useful when dealing with large tables in which case the resulting SQL queries can become very complex and expensive to execute. Materializing intermediate tables as temporary tables can improve the efficiency of the query.

See Also

discard_dimensions()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Data Frame
library(dplyr)

mtcars %>%
  cross_by_dimensions(cyl, am) %>%
  summarize(avg_mpg = mean(mpg))

flights <- nycflights13::flights %>%
  mutate(date = as.Date(ISOdate(year, month, day)))

# find flight delays by carrier, origin, and Overall
flight_summary <- nycflights13::flights %>%
  cross_by_dimensions(carrier, origin) %>%
  summarize(
    nb_flights = n(),
    avg_arr_delay = mean(arr_delay, na.rm = TRUE)
  )

flight_summary

flight_summary <- nycflights13::flights %>%
  cross_by_dimensions(carrier, origin, max_dimensions = 1) %>%
  summarize(
    nb_flights = n(),
    avg_arr_delay = mean(arr_delay, na.rm = TRUE)
  )

flight_summary

# This works well when combined with discard_dimensions, which filters for
# an All level and removes the column

# Look just by carrier
flight_summary %>%
  discard_dimensions(origin)

flight_summary %>%
  discard_dimensions(carrier)

datacamp/tidymetrics documentation built on March 21, 2021, 3:28 a.m.