class: fullscreen, inverse, top, center, text-white background-image: url("images/letter-g.jpg")

.font150[Brought to you by the letter...]

knitr::opts_chunk$set(fig.width=4.25, fig.height=3.5, fig.retina=3,
                      message=FALSE, warning=FALSE, cache = TRUE,
                      autodep = TRUE, hiline=TRUE)
knitr::opts_hooks$set(fig.callout = function(options) {
  if (options$fig.callout) {
    options$echo <- FALSE
    options$out.height <- "99%"
    options$fig.width <- 16
    options$fig.height <- 8
  }
  options
})
hook_source <- knitr::knit_hooks$get('source')
knitr::knit_hooks$set(source = function(x, options) {
  if (!is.null(options$hiline) && options$hiline) {
    x <- stringr::str_replace(x, "^ ?(.+)\\s?#<<", "*\\1")
  }
  hook_source(x, options)
})
options(htmltools.dir.version = FALSE, width = 90)
as_table <- function(...) knitr::kable(..., format='html', digits = 3)

layout: true

Why ggplot2?


.left-column[ Hadley Wickham ]

.right-column[.font150[ The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive. ] .footnote[http://disq.us/p/sv640d] ]


## My personal reasons


layout: false

What are we getting into?


ggplot2 is a huge package: philosophy + functions
...but it's very well organized

--



Lots of examples of not-so-great plots in these slides
...but that's okay

--



Going to throw a lot at you
...but you'll know where and what to look for

--

.img-right[]

--

.img-right[]


layout: true

G is for getting started


Easy: install the tidyverse

install.packages('tidyverse')

Medium: install just ggplot2

install.pacakages('ggplot2')

Expert: install from GitHub

devtools::install_github('tidyverse/ggplot2')

Load the tidyverse

library(tidyverse)

Other packages you'll need for this adventure

library(lubridate)       # tidyverse
library(reshape2)        # install.packages("reshape2")
library(babynames)       # install.packages("babynames")

layout: true

gg is for Grammar of Graphics

.left-column[ ### Data

r g <- ggplot() ]


.right-column[ #### Tidy Data

1. Each variable forms a .hl[column]

2. Each observation forms a .hl[row]

3. Each observational unit forms a table

<br><br>The following example draws from

```r
data(population, package = "tidyr")
```

]

.right-column[ ```r pop_simple <- population %>% filter( country %in% c("United States of America", "Canada", "China"), year %in% seq(1995, 2010, 5) ) %>% mutate( country = ifelse(country == "United States of America", "USA", country) )

messy1 <- pop_simple %>% mutate(population = population/10^6) %>% spread(year, population)
messy2 <- pop_simple %>% mutate(population = population/10^6) %>% spread(country, population)

knitr::kable(messy1, format = 'html')
cat('\n<br>\n')
knitr::kable(messy2, format = 'html')
```
]

.right-column[ r tidy1 <- gather(messy1, 'year', 'population', -country) as_table(tidy1) ]


.right-column[ r tidy2 <- gather(messy2, 'country', 'population', -year) as_table(tidy2) ]


layout: true

gg is for Grammar of Graphics

.left-column[ ### Data ### Aesthetics

r g + aes()

]

.right-column[ Map data to visual elements or parameters

- year

- population

- country
]

.right-column[ Map data to visual elements or parameters

- year → **x**

  - population → **y**

  - country → *shape*, *color*, etc.
]

layout: true

gg is for Grammar of Graphics

.left-column[ ### Data ### Aesthetics ### Geoms

r g + geom_*() ]


Geometric objects displayed on the plot:

.font80[ | Type | Function | |:----:|:--------:| | Point | geom_point() | | Line | geom_line() | | Bar | geom_bar(), geom_col() | | Histogram | geom_histogram() | | Regression | geom_smooth() | | Boxplot | geom_boxplot() | | Text | geom_text() | | Vert./Horiz. Line | geom_{vh}line() | | Count | geom_count() | | Density | geom_density() | ]


.right-column[ Those are just the top 10 most popular geoms1

  See <http://ggplot2.tidyverse.org/reference/> for many more options

Or just start typing `geom_` in RStudio

.font70[
  ```r
  # geom_
  old_width = options(width = 80)
  lsf.str("package:ggplot2") %>% grep("^geom_", ., value = TRUE)
  options(width = old_width$width)
  ```
  ]]

.footnote[[1] https://eric.netlify.com/2017/08/10/most-popular-ggplot2-geoms/]


layout: true

Our first plot!


.left-code[ r ggplot(tidy1) ]

.right-plot[ r ]


.left-code[ r ggplot(tidy1) + aes(x = year, #<< y = population) #<< ]

.right-plot[ r ]


.left-code[ r ggplot(tidy1) + aes(x = year, y = population) + geom_point() #<< ]

.right-plot[ r ]


.left-code[ r ggplot(tidy1) + aes(x = year, y = population, color = country) + #<< geom_point() ]

.right-plot[ r ]


.left-code[ r ggplot(tidy1) + aes(x = year, y = population, color = country) + geom_point() + geom_line() #<<

.font80[
  ```r
  geom_path: Each group consists
  of only one observation.
  Do you need to adjust the
  group aesthetic?
    ```
  ]
]

.right-plot[ r ]


.left-code[ r ggplot(tidy1) + aes(x = year, y = population, color = country) + geom_point() + geom_line( aes(group = country)) #<< ]

.right-plot[ r ]


layout: true

gg is for Grammar of Graphics

.left-column[ ### Data ### Aesthetics ### Geoms

r g + geom_*() ]


.right-column[ r geom_*(mapping, data, stat, position)

- `data` Geoms can have their own data
- Has to map onto global coordinates

- `map` Geoms can have their own aesthetics
- Inherits global aesthetics
- Have geom-specific aesthetics
- `geom_point` needs `x` and `y`, optional `shape`, `color`, `size`, etc.
- `geom_ribbon` requires `x`, `ymin` and `ymax`, optional `fill`
- `?geom_ribbon`
]

.right-column[ r geom_*(mapping, data, stat, position)

- `stat` Some geoms apply further transformations to the data
- All respect `stat = 'identity'`
- Ex: `geom_histogram` uses `stat_bin()` to group observations

- `position` Some adjust location of objects
- `'dodge'`, `'stack'`, `'jitter'`
]

layout: true

Example: Stat and Position


.pull-left[ #### Star Wars Characters

.font90[
  ```r
  sw_chars <- starwars %>%
    mutate(
      n_movies = map_int(films, length),
      gender = ifelse(
        !gender %in% c('female', 'male'),
        'other', gender)
    ) %>%
    select(name, gender, n_movies)
  ```
  ]]

.pull-right[ r as_table(sw_chars) ]


.left-code[ r ggplot(sw_chars) + aes(x = n_movies) + geom_bar(stat = "count") ]

.right-plot[ r ]


.left-code[ r ggplot(sw_chars) + aes(x = n_movies, fill = gender) + #<< geom_bar(stat = "count") ]

.right-plot[ r ]


.pull-left[.font90[ r sw_chars_id <- sw_chars %>% group_by(n_movies, gender) %>% tally ]]

.pull-right[ r as_table(sw_chars_id) ]


.left-code[ r ggplot(sw_chars_id) + aes(x = n_movies, y = n, fill = gender) + geom_bar(stat = 'identity') #<<

.font80[Note: `geom_col()` is alias for <br>`geom_bar(stat = 'identity')` ]
]

.right-plot[ r ]


.left-code[ r ggplot(sw_chars_id) + aes(x = n_movies, y = n, fill = gender) + geom_col(position = "fill") #<< ]

.right-plot[ r ]


.left-code[ r ggplot(sw_chars_id) + aes(x = n_movies, y = n, fill = gender) + geom_col(position = "dodge") #<< ]

.right-plot[ r ]


layout: false exclude: true

Stat and position are functions too

Stat transformations

lsf.str("package:ggplot2") %>% grep("^stat_", ., value = TRUE)

Position transformations

lsf.str("package:ggplot2") %>% grep("^position_", ., value = TRUE)

layout: true

gg is for Grammar of Graphics

.left-column[ ### Data ### Aesthetics ### Geoms ### Facet

```r g+facet_wrap()

g+facet_grid() ``` ]


```r sw_chars <- starwars %>% mutate( n_movies = map_int(films, length), gender = ifelse( !gender %in% c("female", "male"), "other", gender), hair_color = case_when( str_detect(hair_color, "blond") ~ "blond", str_detect(hair_color, "brown") ~ "brown", str_detect(hair_color, "auburn") ~ "auburn", str_detect(hair_color, "(grey|white)") ~ "grey/white", TRUE ~ "other" ) )

.right-column[
  ```r
  g <- ggplot(sw_chars) +
    aes(x = n_movies,
        fill = gender) +
    geom_bar()
  ```
  ]

---

  .right-column[
    ```r
    g + facet_wrap(~ gender)
    ```
    ]

---

  .right-column[
    ```r
    g + facet_grid(gender ~ hair_color)
    ```
    ]

---

  .right-column[
    ```r
    g + facet_grid(gender ~ hair_color, scales = 'free_y')
    ```
    ]

---
  layout: true
# gg is for Grammar of Graphics

.left-column[
  ### Data
  ### Aesthetics
  ### Geoms
  ### Facet
  ### Labels

  ```r
  g + labs()
  ```
  ]

---

  .right-column[
    ```r
    g <- g +
      labs(
        x = "Film Appearances",
        y = "Count of Characters",
        title = "Recurring Star Wars Characters",
        subtitle = "How often do characters appear?",
        fill = "Gender"
      )
    ```
    ]

---

  .right-column[
    ```r
    g
    ```
    ]

---
  layout: true
# gg is for Grammar of Graphics

.left-column[
  ### Data
  ### Aesthetics
  ### Geoms
  ### Facet
  ### Labels
  ### Scales

  ```r
  g + scale_*_*()
  ```
  ]
---

  .right-column[
    `scale` + `_` + `<aes>` + `_` + `<type>` + `()`

    What parameter do you want to adjust? → `<aes>` <br>
      What type is the parameter? → `<type>`

    - I want to change my discrete x-axis<br>`scale_x_discrete()`
    - I want to change point size from continuous variable<br>`scale_size_continuous()`
    - I want to rescale y-axis as log<br>`scale_y_log10()`
    - I want to use a different color palette<br>`scale_fill_discrete()`<br>`scale_color_manual()`
    ]

---

  .right-column[
    ```r
    g <- g + scale_fill_brewer(palette = 'Set1')
    ```

    ```r
    g
    ```


    ]

---
  layout: true
# gg is for Grammar of Graphics

.left-column[
  ### Data
  ### Aesthetics
  ### Geoms
  ### Facet
  ### Labels
  ### Scales
  ### Theme

  ```r
  g + theme()
  ```
  ]

---

  .right-column[
    Change the appearance of plot decorations<br>
      i.e. things that aren't mapped to data

A few "starter" themes ship with the package

- `g + theme_bw()`
- `g + theme_dark()`
- `g + theme_gray()`
- `g + theme_light()`
- `g + theme_minimal()`

]

---

.right-column[
Huge number of parameters, grouped by plot area:

- Global options: `line`, `rect`, `text`, `title`
- `axis`: x-, y- or other axis title, ticks, lines
- `legend`: Plot legends
- `panel`: Actual plot area
- `plot`: Whole image
- `strip`: Facet labels
]
---

.right-column[
Theme options are supported by helper functions:

- `element_blank()` removes the element
- `element_line()`
- `element_rect()`
- `element_text()`
]

---

.right-column[
```r
g + theme_bw()

]


.right-column[ .font80[

g + theme_minimal() + theme(text = element_text(family = "Palatino"))

] ]


.right-column[ You can also set the theme globally with theme_set()

my_theme <- theme_bw() +
  theme(
    text = element_text(family = "Palatino", size = 12),
    panel.border = element_rect(colour = 'grey80'),
    panel.grid.minor = element_blank()
  )

theme_set(my_theme)

]


.right-column[

g

]


.right-column[

g + theme(legend.position = 'bottom')

]


layout: false count: hide class: fullscreen, inverse, top, left, text-white background-image: url(images/super-grover.jpg)

.font200[You have the power!]

class: inverse, center, middle

"Live" Coding

data(tips, package = "reshape2")

head(tips)

as_table(head(tips, 10))

tips: tip histogram

.left-code[

ggplot(tips) +
  aes(x = tip) +
  geom_histogram(   #<<
    binwidth = 0.25 #<<
  )                 #<<

]

.right-plot[


]


layout: true

tips: tip density


.left-code[

ggplot(tips) +
  aes(x = tip) +
  geom_density(     #<<
    aes(fill = day) #<<
  )                 #<<

]

.right-plot[


]


.left-code[

ggplot(tips) +
  aes(x = tip) +
  geom_density(
    aes(fill = day),
    alpha = 0.4 #<<
  )

]

.right-plot[


]


.left-code[

ggplot(tips) +
  aes(x = tip/total_bill) +
  geom_density(
    aes(fill = day)
  ) +
  facet_wrap(~ day) #<<

]

.right-plot[


]


layout: true

tips: tip vs total


.left-code[

ggplot(tips) +
  aes(x = total_bill,
      y = tip) + #<<
  geom_point()   #<<

]

.right-plot[


]


.left-code[

ggplot(tips) +
  aes(x = total_bill,
      y = tip) +
  geom_point() +
  geom_smooth(method = "lm") #<<

]

.right-plot[


]


.left-code[

ggplot(tips) +
  aes(x = total_bill,
      y = tip) +
  geom_point() +
  geom_smooth(method = "lm")+
  geom_abline(            #<<
    slope = c(0.2, 0.15), #<<
    intercept = 0,        #<<
    color = c('#69b578',
"#dd1144"),
linetype = 3)

]

.right-plot[ r ]


.left-code[ r ggplot(tips) + aes(x = total_bill, y = tip/total_bill) + #<< geom_point() + geom_hline( #<< yintercept = c(0.2, 0.15), color = c('#69b578', "#dd1144"), linetype = 1) ]

.right-plot[ r ]


.left-code[ ```r tips$percent <- #<< tips$tip/tips$total_bill #<<

ggplot(tips) +
  aes(x = size,
      y = percent, #<<
      color = smoker) + #<<
  geom_point()
```
]

.right-plot[ r ]


.left-code[ ```r tips$percent <- tips$tip/tips$total_bill

ggplot(tips) +
  aes(x = size,
      y = percent,
      color = smoker) +
  geom_jitter(width = 0.25) #<<
```
]

.right-plot[ r ]


.left-code[ r ggplot(tips) + aes(x = day, y = percent, color = sex) + geom_jitter(width = 0.25) + facet_grid(time ~ smoker) #<< ]

.right-plot[ r ]


.left-code[ ```r tips <- mutate(tips, time = factor(time, #<< c("Lunch", "Dinner")), #<< day = factor(day, #<< c("Thur", "Fri", #<< "Sat", "Sun") #<< ))

ggplot(tips) +
  aes(x = day,
      y = percent,
      color = sex) +
  geom_jitter(width = 0.25) +
  facet_grid(time ~ smoker)
```
]

.right-plot[ r ]


.left-code[ r ggplot(tips) + aes(x = day, y = percent, fill = time) + geom_boxplot() + #<< facet_grid(. ~ smoker) ]

.right-plot[ r ]


.left-code[ r ggplot(tips) + aes(x = day, y = percent, color = smoker, #<< fill = smoker) + #<< geom_violin(alpha = 0.3) + #<< facet_wrap(~ smoker) ]

.right-plot[ r ]


.left-code[ r g <- ggplot(tips) + aes(x = day, y = percent, color = smoker, fill = smoker) + geom_violin(alpha = 0.3) + geom_jitter(alpha = 0.4, #<< width = 0.25, #<< size = 0.8)+ #<< facet_wrap(~ smoker) g ]

.right-plot[ r ]


.left-code[ r g + guides(color = FALSE, fill = FALSE) + labs(x = '', y = 'Tip Rate') + scale_y_continuous( #<< labels = scales::percent #<< ) #<< ]

.right-plot[ r ]


layout: false class: inverse, center, middle

Level up

data(babynames, 'babynames')

# head(babynames)

The babynames package contains data provided by the USA social security administration:

babynames %>%
  sample_n(6) %>%
  as_table

layout: true

Most popular baby names in 2015


.pull-left[ ```r babynames_pop2015 <- babynames %>% filter(year == 2015) %>% mutate( n = n/1000, sex = case_when( sex == "F" ~ "Girl Names", TRUE ~ "Boy Names" )) %>% group_by(sex) %>% top_n(10, n)

```
]

.pull-right[ r babynames_pop2015 %>% slice(1:4) %>% as_table() ]


```r g_babynames <- ggplot(babynames_pop2015) + {{aes(y = n, x = name)}} + {{geom_col()}}

.plot-callout[
  ```r
  g_babynames
  ```
  ]

---

  ```r
g_babynames <- ggplot(babynames_pop2015) +
  aes(y = n, x = name) +
  geom_col() +
  {{coord_flip()}}

.plot-callout[ r g_babynames ]


```r g_babynames <- ggplot(babynames_pop2015) + {{aes(y = n, x = fct_reorder(name, n))}} + geom_col() + coord_flip()

<br>📦 `fct_reorder` comes from the tidyverse package `forecats`

.plot-callout[
  ```r
  g_babynames
  ```
  ]

---

  ```r
g_babynames <- ggplot(babynames_pop2015) +
  {{aes(y = n, x = fct_reorder(name, n), fill = sex)}} +
  geom_col() +
  coord_flip()

.plot-callout[ r g_babynames ]


```r g_babynames <- ggplot(babynames_pop2015) + aes(y = n, x = fct_reorder(name, n), fill = sex) + geom_col() + coord_flip() + {{facet_wrap( ~ sex, scales = 'free_y')}}

.plot-callout[
  ```r
  g_babynames
  ```
  ]

---

  ```r
g_babynames <- ggplot(babynames_pop2015) +
  aes(y = n, x = fct_reorder(name, n), fill = sex) +
  geom_col() +
  * geom_text(
    *   aes(label = format(n*1000, big.mark = ',')),
    *   size = 9, hjust = 1.1,
    *   color = 'white', family = 'Fira Sans'
    * ) +
  coord_flip() +
  facet_wrap( ~ sex, scales = 'free_y')

.plot-callout[ ```r g_babynames <-ggplot(babynames_pop2015) + aes(y = n, x = fct_reorder(name, n), fill = sex) + geom_col() + geom_text( aes(label = format(n*1000, big.mark = ',')), size = 9, hjust = 1.1, color = 'white', family = 'Fira Sans' ) + coord_flip() + facet_wrap( ~ sex, scales = 'free_y')

g_babynames ``` ]


```r g_babynames + labs(x = '', y = 'Number of Babies Born in 2015 (thousands)') + guides(fill = FALSE) + scale_fill_manual( values = c("Boy Names" = "#77cbb9", "Girl Names" = "#a077cb")) + theme( strip.text = element_text(face = 'bold', size = 20), strip.background = element_blank(), text = element_text(size = 24) )

.plot-callout[
  ```r
  ```
  ]

---

  ```r

layout: true

Gender-bending baby names


Find babynames that were

  1. More "boyish" or "girlish" in pre-1900s and opposite in post-1900s

2. Pick top 10 boy ↔ girl names

```r source('babynames-prep.R') cat("\n\nBoy → Girl Names:
", paste(names_m2f, collapse = ', '))

cat("\n\nGirl → Boy Names:
", paste(names_f2m, collapse = ', '))

---

  .pull-left[
    Data-preprocessing:

      1. Un-tidy `sex` column into `Female` and `Male`

    2. Calculate difference in proportion by name

    3. Add groups for area plot (thank you [stackoverflow](https://stackoverflow.com/a/7883556)!)

    <br><br>Check out `babynames-prep.R` in repo
    ]

.pull-right[
  ```r
  sel_change_babynames %>%
    select(year, name, prop, prop_group) %>%
    sample_n(10) %>%
    arrange(year, name) %>%
    filter(abs(prop) > 0) %>%
    knitr::kable(digits = 5, format = 'html')
  ```
  ]

---

  ```r
ggplot(sel_change_babynames) +
  aes(x = year, y = prop)

.plot-callout[ r ]


```r ggplot(sel_change_babynames) + aes(x = year, y = prop) + {{geom_line(color = "grey50", aes(group=name))}}

.plot-callout[
  ```r
  ```
  ]

---

  ```r
ggplot(sel_change_babynames) +
  aes(x = year, y = prop, fill = prop > 0) +
  {{geom_area(aes(group = prop_group))}} +
  geom_line(color = "grey50", aes(group=name))+
  {{facet_wrap(~ name, scales = 'free_y', ncol = 5)}}

.plot-callout[ r ]


```r g_bnc <- ggplot(sel_change_babynames) + aes(x = year, y = prop, fill = prop > 0) + geom_area(aes(group = prop_group)) + geom_line(color = "grey50", aes(group=name))+ facet_wrap(~ name, scales = 'free_y', ncol = 5) + {{scale_fill_manual(values = c("#6ec4db", "#fa7c92"))}} + {{guides(fill = FALSE)}} + {{labs(x = '', y = '')}}

g_bnc

.plot-callout[
  ```r
  ```
  ]


---

  ```r
g_bnc <- g_bnc +
  theme_minimal(base_family = 'Palatino') +
  theme(
    axis.text.y = element_blank(),
    strip.text = element_text(size = 18, face = 'bold'),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_line(color = "grey80", linetype = 3))

.plot-callout[ r g_bnc ]


r g_bnc


layout: false class: inverse, middle, center

g is for Goodbye


layout: true

Stack Exchange is Awesome



layout: false

ggplot2 Extensions: ggplot2-exts.org


# ggplot2 and beyond

### Learn more

Noteworthy RStudio Add-Ins


Practice and Review

Fun Datasets

Review


class: inverse, center, middle

Thanks!

.font150.text-white[ @grrrck
github.com/gadenbuie
Garrick Aden-Buie ]



wolski/p3003PBC documentation built on Nov. 30, 2024, 7:14 a.m.