class: inverse, left, bottom background-image: url(https://images.unsplash.com/photo-1543286386-713bdd548da4?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1470&q=80) background-size: cover

.Large[tidypivot]

.small[Some ideas]

.tiny[Dr. Evangeline Reynolds | 2022-04-26 |Image credit: William Iven, Upsplash]

??? Title slide

# This is the recommended set up for flipbooks
# you might think about setting cache to TRUE as you gain practice --- building flipbooks from scratch can be time consuming
knitr::opts_chunk$set(fig.width = 6, message = FALSE, warning = FALSE, comment = "", cache = F)
library(flipbookr)
library(tidyverse)

```{css, eval = TRUE, echo = FALSE} .remark-code{line-height: 1.5; font-size: 100%}

@media print { .has-continuation { display: block; } }

code.r.hljs.remark-code{ position: relative; overflow-x: hidden; }

code.r.hljs.remark-code:hover{ overflow-x:visible; width: 500px; border-style: solid; }

# Step 00. prep some data, records and flat data frame

```r
library(tidyverse)
library(magrittr)

Titanic %>% 
  data.frame() %>% 
  uncount(weights = Freq) ->
tidy_titanic ; tidy_titanic %>% head()

Titanic %>% 
  data.frame() ->
flat_titanic ; flat_titanic %>% head()

Step 0. Some observations:

--

ggplot2: user needs to describe layout of graphic...


???


Lots of data science is counting things...

--

Less experience in this world


But a graphic may actually look a lot like a table!

horizontal position variables are both categorical...

you can make a visual pivot table in ggplot2; analyst job is to describe the form. How will it look

specify 3 things - start with visual layout

tidy_titanic %>% 
  ggplot() + 
  aes(x = Sex, y = Survived) + 
  geom_jitter() + 
  geom_count(color = "blue")

But sometimes you may actually not need a graphic, but rather the raw numbers.

--

What does getting there look like using the tidyverse tools?


tidy_titanic %>% 
  group_by(Sex, Survived) %>% 
  summarize(count = n()) %>% 
  pivot_wider(names_from = Survived, 
              values_from = count)

With existing pivot tools, description isn't so visual

--

can we make a more visual, declarative API?


Step 1a. Make Functions to allow description of final table, pivot_count and pivot_calc

x argument is horizontal elements (columns) and y is vertical elements (rows)

pivot_count_script <- readLines("./R/pivot_count.R")



pivot_calc_script <- readLines("./R/pivot_calc.R")


Step 1b. Using those functions

# rows and cols
tidy_titanic %>% 
  pivot_count(rows = Survived, cols = Sex) 

# cols only
tidy_titanic %>% 
   pivot_count(cols = Sex)

# rows only
tidy_titanic %>% 
  pivot_count(rows = Survived) 

# two rows and col
tidy_titanic %>% 
  pivot_count(rows = c(Survived, Class), cols = Sex)

# two rows and col and contains zero counts
tidy_titanic %>% 
  pivot_count(rows = c(Survived, Class), cols = c(Sex, Age))

# two rows and col and contains zero counts
tidy_titanic %>% 
  pivot_count(rows = c(Survived, Class), cols = c(Sex, Age), pivot = F)

# count all
tidy_titanic %>% 
   pivot_count()

# for fun organize like it will appear visually in code
tidy_titanic %>% 
  pivot_count(                          cols = Sex, 
              rows = c(Survived, Class)        )

After examining your table you might actually want to have the calculation in long form (for use in something like ggplot2). This is what pivot = F is for!

tidy_titanic %>% 
  pivot_count(cols = Sex, rows = Survived, pivot = F)

1b. pivot_calc using pivot calc function for non count aggregation

just for fun arrange the code how the table will look

flat_titanic %>%
  pivot_calc(              cols = Sex, 
             rows = Survived, value = Freq, fun = sum)

flat_titanic %>% 
  pivot_count(cols = Sex, 
             rows = Survived, wt = Freq)

Issue: For this case, we should probably use pivot_count and allow for a wt argument.

1b style. use another tool to style

goal of functions is not to style - just to make calculation faster by using a visually driven API

tidy_titanic %>%  
  pivot_count(cols = Sex, rows = c(Survived, Class)) %>% 
  group_by(Class) %>% 
  gt::gt()
tidy_titanic %>% 
  pivot_count(cols = Sex, rows = c(Survived, Class, Age)) %>% 
  group_by(Age) %>% 
  gt::gt()

Back to Step 0, Observations: use existing tools to calculate proportions is many step process

feels like lots of gymnastics... a vis first approach is what we are after

tidy_titanic %>% 
  group_by(Sex, Survived) %>% 
  summarize(value = n()) %>% 
  group_by(Sex) %>% 
  mutate(prop = value/sum(value)) %>% 
  select(-value) %>% 
  pivot_wider(values_from = prop, names_from = Sex)

Step 2a. build a function where visual arrangement leads.

pivot_prop_script <- readLines("./R/pivot_prop.R")


Step 2b. using the pivot_prop

tidy_titanic %>% 
  pivot_prop(rows = Survived, cols = Class, within = Class)

tidy_titanic %>% 
  pivot_prop(rows = c(Survived, Sex), 
               cols = Class, 
               within = Survived)

tidy_titanic %>% 
  pivot_prop(rows = c(Survived, Sex), 
               cols = Class, 
               within = c(Survived, Sex))

tidy_titanic %>% 
  pivot_prop(rows = c(Survived, Sex), 
               cols = Class, 
               within = Survived, pivot = F) %>% 
  ggplot() +
  aes(x = Class, y = Sex) +
  facet_grid(rows = vars(Survived)) +
  geom_tile() +
  aes(fill = prop) + 
  geom_text(aes(label = prop %>% round(3)))
tidy_titanic %>% 
  pivot_prop(rows = c(Survived, Sex), 
               cols = Class, 
               within = c(Survived, Sex), pivot = F) %>% 
  ggplot() +
  aes(x = Class, y = 1) +
  facet_grid(rows = vars(Survived, Sex)) +
  geom_tile() +
  aes(fill = prop) + 
  geom_text(aes(label = prop %>% round(3)))

Reflections, questions, issues

  1. Does this already exist?
  2. ~~How can API improve? possibly rows = vars(y00, y0, y), cols = vars(x). and within = vars(?, ?) This requires more digging into tidy eval. What about multiple x vars?~~ These changes implemented thanks to Brian and Shannon
  3. ~~How can internals improve? Also tidy eval is old I think. defaults for missing data.~~ Using new {{}} tidy eval within and across, and rlang::quo_is_null() thanks to Brian
  4. What about summary columns, rows. Column totals, etc. Maybe not very tidy... maybe allowed w/ error message.
  5. Ideas about different API - more like ggplot2, where you would specify new dimensions of your table after piping. Would require function to create non-data frame type object. Not sure about consistency dplyr/tidyr tools. These just return dataframes/tibble. I think being consistent with that expectation might be a good thing. Also less challenging to implement.

library(ggstamp)
set.seed(128790)
ggcanvas() + 
  stamp_tile(x = rnorm(500)/2.5, 
             y= rnorm(500)/2.5, 
             fill = "darkslateblue",
             alpha = runif(500, .25, 1),
             width = rnorm(500)/3,
             height = rnorm(500)/3, 
             color = alpha("black", .75)) + 
  stamp_point(size = 8, color = "grey85") + 
  stamp_spoke(angle = pi, size = 1.7, 
              radius = .65, color = "grey85") +
  stamp_point(size = 5, color = "grey85") + 
  stamp_point(size = 3, color = "grey55") + 

  stamp_polygon_holes(y0 = .25) + 
  stamp_text(label = "tidy", hjust = 1,
             color = "violetred", 
               vjust = -.2, size = 17) + 
  stamp_text(label = "pivot", hjust = -.1,
             angle = 0:25, color = "violetred", 
               vjust = -.2, size = 17, alpha = c(0:25/400)) + 
  stamp_spoke(angle = .45*0:25/25, size = 1.7, 
              radius = .85, color = "grey55", alpha = c(0:25/400)) +
  stamp_spoke(angle = .45, size = 1.7, 
              radius = .85, color = "grey55") +
  stamp_text(label = "pivot", hjust = -.1,
             angle = 25, color = "violetred", 
               vjust = -.2, size = 17) +
  ggstamp::theme_void_fill(fill = "darkslateblue") + 
  stamp_polygon(color = "lightslateblue", 
                alpha = 0, y0 = .25, size = 4)

Other work in this space


janitor::tably

tidy_titanic %>% 
  janitor::tabyl(Sex, Survived) %>% 
  janitor::adorn_pct_formatting()


EvaMaeRey/tidypivot documentation built on Feb. 27, 2025, 4:04 a.m.