class: fullscreen, inverse, top, center, text-white background-image: url("images/letter-g.jpg")
.font150[Brought to you by the letter...]
knitr::opts_chunk$set(fig.width=4.25, fig.height=3.5, fig.retina=3, message=FALSE, warning=FALSE, cache = TRUE, autodep = TRUE, hiline=TRUE) knitr::opts_hooks$set(fig.callout = function(options) { if (options$fig.callout) { options$echo <- FALSE options$out.height <- "99%" options$fig.width <- 16 options$fig.height <- 8 } options }) hook_source <- knitr::knit_hooks$get('source') knitr::knit_hooks$set(source = function(x, options) { if (!is.null(options$hiline) && options$hiline) { x <- stringr::str_replace(x, "^ ?(.+)\\s?#<<", "*\\1") } hook_source(x, options) }) options(htmltools.dir.version = FALSE, width = 90) as_table <- function(...) knitr::kable(..., format='html', digits = 3)
layout: true
.left-column[
Hadley Wickham
]
.right-column[.font150[ The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive. ] .footnote[http://disq.us/p/sv640d] ]
## My personal reasons
.hl[Functional] data visualization
Wrange data
Tweak scales, guides, axis, labels, theme
Easy to .hl[reason] about how data drives visualization
Easy to .hl[iterate]
Easy to be .hl[consistent]
layout: false
ggplot2
is a huge package: philosophy + functions
...but it's very well organized
--
Lots of examples of not-so-great plots in these slides
...but that's okay
--
Going to throw a lot at you
...but you'll know where and what to look for
--
.img-right[]
--
.img-right[]
layout: true
Easy: install the tidyverse
install.packages('tidyverse')
Medium: install just ggplot2
install.pacakages('ggplot2')
Expert: install from GitHub
devtools::install_github('tidyverse/ggplot2')
library(tidyverse)
library(lubridate) # tidyverse library(reshape2) # install.packages("reshape2") library(babynames) # install.packages("babynames")
layout: true
.left-column[ ### Data
r
g <- ggplot()
]
.right-column[ #### Tidy Data
1. Each variable forms a .hl[column] 2. Each observation forms a .hl[row] 3. Each observational unit forms a table <br><br>The following example draws from ```r data(population, package = "tidyr") ``` ]
.right-column[ ```r pop_simple <- population %>% filter( country %in% c("United States of America", "Canada", "China"), year %in% seq(1995, 2010, 5) ) %>% mutate( country = ifelse(country == "United States of America", "USA", country) )
messy1 <- pop_simple %>% mutate(population = population/10^6) %>% spread(year, population) messy2 <- pop_simple %>% mutate(population = population/10^6) %>% spread(country, population) knitr::kable(messy1, format = 'html') cat('\n<br>\n') knitr::kable(messy2, format = 'html') ``` ]
.right-column[
r
tidy1 <- gather(messy1, 'year', 'population', -country)
as_table(tidy1)
]
.right-column[
r
tidy2 <- gather(messy2, 'country', 'population', -year)
as_table(tidy2)
]
layout: true
.left-column[ ### Data ### Aesthetics
r
g + aes()
.right-column[ Map data to visual elements or parameters
- year - population - country ]
.right-column[ Map data to visual elements or parameters
- year → **x** - population → **y** - country → *shape*, *color*, etc. ]
layout: true
.left-column[ ### Data ### Aesthetics ### Geoms
r
g + geom_*()
]
Geometric objects displayed on the plot:
.font80[
| Type | Function |
|:----:|:--------:|
| Point | geom_point()
|
| Line | geom_line()
|
| Bar | geom_bar()
, geom_col()
|
| Histogram | geom_histogram()
|
| Regression | geom_smooth()
|
| Boxplot | geom_boxplot()
|
| Text | geom_text()
|
| Vert./Horiz. Line | geom_{vh}line()
|
| Count | geom_count()
|
| Density | geom_density()
|
]
.right-column[ Those are just the top 10 most popular geoms1
See <http://ggplot2.tidyverse.org/reference/> for many more options Or just start typing `geom_` in RStudio .font70[ ```r # geom_ old_width = options(width = 80) lsf.str("package:ggplot2") %>% grep("^geom_", ., value = TRUE) options(width = old_width$width) ``` ]]
.footnote[[1] https://eric.netlify.com/2017/08/10/most-popular-ggplot2-geoms/]
layout: true
.left-code[
r
ggplot(tidy1)
]
.right-plot[
r
]
.left-code[
r
ggplot(tidy1) +
aes(x = year, #<<
y = population) #<<
]
.right-plot[
r
]
.left-code[
r
ggplot(tidy1) +
aes(x = year,
y = population) +
geom_point() #<<
]
.right-plot[
r
]
.left-code[
r
ggplot(tidy1) +
aes(x = year,
y = population,
color = country) + #<<
geom_point()
]
.right-plot[
r
]
.left-code[
r
ggplot(tidy1) +
aes(x = year,
y = population,
color = country) +
geom_point() +
geom_line() #<<
.font80[ ```r geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? ``` ] ]
.right-plot[
r
]
.left-code[
r
ggplot(tidy1) +
aes(x = year,
y = population,
color = country) +
geom_point() +
geom_line(
aes(group = country)) #<<
]
.right-plot[
r
]
layout: true
.left-column[ ### Data ### Aesthetics ### Geoms
r
g + geom_*()
]
.right-column[
r
geom_*(mapping, data, stat, position)
- `data` Geoms can have their own data - Has to map onto global coordinates - `map` Geoms can have their own aesthetics - Inherits global aesthetics - Have geom-specific aesthetics - `geom_point` needs `x` and `y`, optional `shape`, `color`, `size`, etc. - `geom_ribbon` requires `x`, `ymin` and `ymax`, optional `fill` - `?geom_ribbon` ]
.right-column[
r
geom_*(mapping, data, stat, position)
- `stat` Some geoms apply further transformations to the data - All respect `stat = 'identity'` - Ex: `geom_histogram` uses `stat_bin()` to group observations - `position` Some adjust location of objects - `'dodge'`, `'stack'`, `'jitter'` ]
layout: true
.pull-left[ #### Star Wars Characters
.font90[ ```r sw_chars <- starwars %>% mutate( n_movies = map_int(films, length), gender = ifelse( !gender %in% c('female', 'male'), 'other', gender) ) %>% select(name, gender, n_movies) ``` ]]
.pull-right[
r
as_table(sw_chars)
]
.left-code[
r
ggplot(sw_chars) +
aes(x = n_movies) +
geom_bar(stat = "count")
]
.right-plot[
r
]
.left-code[
r
ggplot(sw_chars) +
aes(x = n_movies,
fill = gender) + #<<
geom_bar(stat = "count")
]
.right-plot[
r
]
.pull-left[.font90[
r
sw_chars_id <- sw_chars %>%
group_by(n_movies, gender) %>%
tally
]]
.pull-right[
r
as_table(sw_chars_id)
]
.left-code[
r
ggplot(sw_chars_id) +
aes(x = n_movies,
y = n,
fill = gender) +
geom_bar(stat = 'identity') #<<
.font80[Note: `geom_col()` is alias for <br>`geom_bar(stat = 'identity')` ] ]
.right-plot[
r
]
.left-code[
r
ggplot(sw_chars_id) +
aes(x = n_movies,
y = n,
fill = gender) +
geom_col(position = "fill") #<<
]
.right-plot[
r
]
.left-code[
r
ggplot(sw_chars_id) +
aes(x = n_movies,
y = n,
fill = gender) +
geom_col(position = "dodge") #<<
]
.right-plot[
r
]
layout: false exclude: true
lsf.str("package:ggplot2") %>% grep("^stat_", ., value = TRUE)
lsf.str("package:ggplot2") %>% grep("^position_", ., value = TRUE)
layout: true
.left-column[ ### Data ### Aesthetics ### Geoms ### Facet
```r g+facet_wrap()
g+facet_grid() ``` ]
```r sw_chars <- starwars %>% mutate( n_movies = map_int(films, length), gender = ifelse( !gender %in% c("female", "male"), "other", gender), hair_color = case_when( str_detect(hair_color, "blond") ~ "blond", str_detect(hair_color, "brown") ~ "brown", str_detect(hair_color, "auburn") ~ "auburn", str_detect(hair_color, "(grey|white)") ~ "grey/white", TRUE ~ "other" ) )
.right-column[ ```r g <- ggplot(sw_chars) + aes(x = n_movies, fill = gender) + geom_bar() ``` ] --- .right-column[ ```r g + facet_wrap(~ gender) ``` ] --- .right-column[ ```r g + facet_grid(gender ~ hair_color) ``` ] --- .right-column[ ```r g + facet_grid(gender ~ hair_color, scales = 'free_y') ``` ] --- layout: true # gg is for Grammar of Graphics .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ### Labels ```r g + labs() ``` ] --- .right-column[ ```r g <- g + labs( x = "Film Appearances", y = "Count of Characters", title = "Recurring Star Wars Characters", subtitle = "How often do characters appear?", fill = "Gender" ) ``` ] --- .right-column[ ```r g ``` ] --- layout: true # gg is for Grammar of Graphics .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ### Labels ### Scales ```r g + scale_*_*() ``` ] --- .right-column[ `scale` + `_` + `<aes>` + `_` + `<type>` + `()` What parameter do you want to adjust? → `<aes>` <br> What type is the parameter? → `<type>` - I want to change my discrete x-axis<br>`scale_x_discrete()` - I want to change point size from continuous variable<br>`scale_size_continuous()` - I want to rescale y-axis as log<br>`scale_y_log10()` - I want to use a different color palette<br>`scale_fill_discrete()`<br>`scale_color_manual()` ] --- .right-column[ ```r g <- g + scale_fill_brewer(palette = 'Set1') ``` ```r g ``` ] --- layout: true # gg is for Grammar of Graphics .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ### Labels ### Scales ### Theme ```r g + theme() ``` ] --- .right-column[ Change the appearance of plot decorations<br> i.e. things that aren't mapped to data A few "starter" themes ship with the package - `g + theme_bw()` - `g + theme_dark()` - `g + theme_gray()` - `g + theme_light()` - `g + theme_minimal()` ] --- .right-column[ Huge number of parameters, grouped by plot area: - Global options: `line`, `rect`, `text`, `title` - `axis`: x-, y- or other axis title, ticks, lines - `legend`: Plot legends - `panel`: Actual plot area - `plot`: Whole image - `strip`: Facet labels ] --- .right-column[ Theme options are supported by helper functions: - `element_blank()` removes the element - `element_line()` - `element_rect()` - `element_text()` ] --- .right-column[ ```r g + theme_bw()
]
.right-column[ .font80[
g + theme_minimal() + theme(text = element_text(family = "Palatino"))
] ]
.right-column[
You can also set the theme globally with theme_set()
my_theme <- theme_bw() + theme( text = element_text(family = "Palatino", size = 12), panel.border = element_rect(colour = 'grey80'), panel.grid.minor = element_blank() ) theme_set(my_theme)
]
.right-column[
g
]
.right-column[
g + theme(legend.position = 'bottom')
]
layout: false count: hide class: fullscreen, inverse, top, left, text-white background-image: url(images/super-grover.jpg)
class: inverse, center, middle
data(tips, package = "reshape2")
as_table(head(tips, 10))
.left-code[
ggplot(tips) + aes(x = tip) + geom_histogram( #<< binwidth = 0.25 #<< ) #<<
]
.right-plot[
]
layout: true
.left-code[
ggplot(tips) + aes(x = tip) + geom_density( #<< aes(fill = day) #<< ) #<<
]
.right-plot[
]
.left-code[
ggplot(tips) + aes(x = tip) + geom_density( aes(fill = day), alpha = 0.4 #<< )
]
.right-plot[
]
.left-code[
ggplot(tips) + aes(x = tip/total_bill) + geom_density( aes(fill = day) ) + facet_wrap(~ day) #<<
]
.right-plot[
]
layout: true
.left-code[
ggplot(tips) + aes(x = total_bill, y = tip) + #<< geom_point() #<<
]
.right-plot[
]
.left-code[
ggplot(tips) + aes(x = total_bill, y = tip) + geom_point() + geom_smooth(method = "lm") #<<
]
.right-plot[
]
.left-code[
ggplot(tips) + aes(x = total_bill, y = tip) + geom_point() + geom_smooth(method = "lm")+ geom_abline( #<< slope = c(0.2, 0.15), #<< intercept = 0, #<< color = c('#69b578', "#dd1144"), linetype = 3)
]
.right-plot[
r
]
.left-code[
r
ggplot(tips) +
aes(x = total_bill,
y = tip/total_bill) + #<<
geom_point() +
geom_hline( #<<
yintercept = c(0.2, 0.15),
color = c('#69b578',
"#dd1144"),
linetype = 1)
]
.right-plot[
r
]
.left-code[ ```r tips$percent <- #<< tips$tip/tips$total_bill #<<
ggplot(tips) + aes(x = size, y = percent, #<< color = smoker) + #<< geom_point() ``` ]
.right-plot[
r
]
.left-code[ ```r tips$percent <- tips$tip/tips$total_bill
ggplot(tips) + aes(x = size, y = percent, color = smoker) + geom_jitter(width = 0.25) #<< ``` ]
.right-plot[
r
]
.left-code[
r
ggplot(tips) +
aes(x = day,
y = percent,
color = sex) +
geom_jitter(width = 0.25) +
facet_grid(time ~ smoker) #<<
]
.right-plot[
r
]
.left-code[ ```r tips <- mutate(tips, time = factor(time, #<< c("Lunch", "Dinner")), #<< day = factor(day, #<< c("Thur", "Fri", #<< "Sat", "Sun") #<< ))
ggplot(tips) + aes(x = day, y = percent, color = sex) + geom_jitter(width = 0.25) + facet_grid(time ~ smoker) ``` ]
.right-plot[
r
]
.left-code[
r
ggplot(tips) +
aes(x = day,
y = percent,
fill = time) +
geom_boxplot() + #<<
facet_grid(. ~ smoker)
]
.right-plot[
r
]
.left-code[
r
ggplot(tips) +
aes(x = day,
y = percent,
color = smoker, #<<
fill = smoker) + #<<
geom_violin(alpha = 0.3) + #<<
facet_wrap(~ smoker)
]
.right-plot[
r
]
.left-code[
r
g <- ggplot(tips) +
aes(x = day,
y = percent,
color = smoker,
fill = smoker) +
geom_violin(alpha = 0.3) +
geom_jitter(alpha = 0.4, #<<
width = 0.25, #<<
size = 0.8)+ #<<
facet_wrap(~ smoker)
g
]
.right-plot[
r
]
.left-code[
r
g + guides(color = FALSE,
fill = FALSE) +
labs(x = '',
y = 'Tip Rate') +
scale_y_continuous( #<<
labels = scales::percent #<<
) #<<
]
.right-plot[
r
]
layout: false class: inverse, center, middle
data(babynames, 'babynames')
# head(babynames)
The babynames package contains data provided by the USA social security administration:
babynames
: For each year from 1880 to 2015, the number of children of babynames %>% sample_n(6) %>% as_table
layout: true
.pull-left[ ```r babynames_pop2015 <- babynames %>% filter(year == 2015) %>% mutate( n = n/1000, sex = case_when( sex == "F" ~ "Girl Names", TRUE ~ "Boy Names" )) %>% group_by(sex) %>% top_n(10, n)
``` ]
.pull-right[
r
babynames_pop2015 %>%
slice(1:4) %>%
as_table()
]
```r g_babynames <- ggplot(babynames_pop2015) + {{aes(y = n, x = name)}} + {{geom_col()}}
.plot-callout[ ```r g_babynames ``` ] --- ```r g_babynames <- ggplot(babynames_pop2015) + aes(y = n, x = name) + geom_col() + {{coord_flip()}}
.plot-callout[
r
g_babynames
]
```r g_babynames <- ggplot(babynames_pop2015) + {{aes(y = n, x = fct_reorder(name, n))}} + geom_col() + coord_flip()
<br>📦 `fct_reorder` comes from the tidyverse package `forecats` .plot-callout[ ```r g_babynames ``` ] --- ```r g_babynames <- ggplot(babynames_pop2015) + {{aes(y = n, x = fct_reorder(name, n), fill = sex)}} + geom_col() + coord_flip()
.plot-callout[
r
g_babynames
]
```r g_babynames <- ggplot(babynames_pop2015) + aes(y = n, x = fct_reorder(name, n), fill = sex) + geom_col() + coord_flip() + {{facet_wrap( ~ sex, scales = 'free_y')}}
.plot-callout[ ```r g_babynames ``` ] --- ```r g_babynames <- ggplot(babynames_pop2015) + aes(y = n, x = fct_reorder(name, n), fill = sex) + geom_col() + * geom_text( * aes(label = format(n*1000, big.mark = ',')), * size = 9, hjust = 1.1, * color = 'white', family = 'Fira Sans' * ) + coord_flip() + facet_wrap( ~ sex, scales = 'free_y')
.plot-callout[ ```r g_babynames <-ggplot(babynames_pop2015) + aes(y = n, x = fct_reorder(name, n), fill = sex) + geom_col() + geom_text( aes(label = format(n*1000, big.mark = ',')), size = 9, hjust = 1.1, color = 'white', family = 'Fira Sans' ) + coord_flip() + facet_wrap( ~ sex, scales = 'free_y')
g_babynames ``` ]
```r g_babynames + labs(x = '', y = 'Number of Babies Born in 2015 (thousands)') + guides(fill = FALSE) + scale_fill_manual( values = c("Boy Names" = "#77cbb9", "Girl Names" = "#a077cb")) + theme( strip.text = element_text(face = 'bold', size = 20), strip.background = element_blank(), text = element_text(size = 24) )
.plot-callout[ ```r ``` ] --- ```r
layout: true
Find babynames that were
```r
source('babynames-prep.R')
cat("\n\nBoy → Girl Names:
", paste(names_m2f, collapse = ', '))
cat("\n\nGirl → Boy Names:
", paste(names_f2m, collapse = ', '))
--- .pull-left[ Data-preprocessing: 1. Un-tidy `sex` column into `Female` and `Male` 2. Calculate difference in proportion by name 3. Add groups for area plot (thank you [stackoverflow](https://stackoverflow.com/a/7883556)!) <br><br>Check out `babynames-prep.R` in repo ] .pull-right[ ```r sel_change_babynames %>% select(year, name, prop, prop_group) %>% sample_n(10) %>% arrange(year, name) %>% filter(abs(prop) > 0) %>% knitr::kable(digits = 5, format = 'html') ``` ] --- ```r ggplot(sel_change_babynames) + aes(x = year, y = prop)
.plot-callout[
r
]
```r ggplot(sel_change_babynames) + aes(x = year, y = prop) + {{geom_line(color = "grey50", aes(group=name))}}
.plot-callout[ ```r ``` ] --- ```r ggplot(sel_change_babynames) + aes(x = year, y = prop, fill = prop > 0) + {{geom_area(aes(group = prop_group))}} + geom_line(color = "grey50", aes(group=name))+ {{facet_wrap(~ name, scales = 'free_y', ncol = 5)}}
.plot-callout[
r
]
```r g_bnc <- ggplot(sel_change_babynames) + aes(x = year, y = prop, fill = prop > 0) + geom_area(aes(group = prop_group)) + geom_line(color = "grey50", aes(group=name))+ facet_wrap(~ name, scales = 'free_y', ncol = 5) + {{scale_fill_manual(values = c("#6ec4db", "#fa7c92"))}} + {{guides(fill = FALSE)}} + {{labs(x = '', y = '')}}
g_bnc
.plot-callout[ ```r ``` ] --- ```r g_bnc <- g_bnc + theme_minimal(base_family = 'Palatino') + theme( axis.text.y = element_blank(), strip.text = element_text(size = 18, face = 'bold'), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.major.x = element_line(color = "grey80", linetype = 3))
.plot-callout[
r
g_bnc
]
r
g_bnc
layout: false class: inverse, middle, center
layout: true
layout: false
# ggplot2 and beyond
### Learn more
ggplot2 docs: http://ggplot2.tidyverse.org/
R4DS - Data visualization: http://r4ds.had.co.nz/data-visualisation.html
Hadley Wickham's ggplot2 book: https://www.amazon.com/dp/0387981403/
ggplotThemeAssist: Customize your ggplot theme interactively
ggedit: Layer, scale, and theme editing
fivethirtyeight
nycflights
ggplot2movies
population
and who
in tidyr
class: inverse, center, middle
.font150.text-white[
@grrrck
github.com/gadenbuie
Garrick Aden-Buie
]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.