class: inverse
title: "Extending ggplot2 statistical geometries" subtitle: "MAA Metro New York" author: "Gina Reynolds" date: 'Saturday May 1, 2021, 11AM' output: xaringan::moon_reader: lib_dir: libs seal: false nature: ratio: 16:10 highlightStyle: github highlightLines: true countIncrementalSlides: false beforeInit: "https://platform.twitter.com/widgets.js"
class: inverse
class: inverse background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover # Extending ggplot2 statistical geometries ## MAA Metro New York ### Dr. Evangeline 'Gina' Reynolds ### Saturday May 1, 2021, 11AM #### Photo Credit: Mike Lewis HeadSmart Media ```r knitr::opts_chunk$set(echo = T, comment = "", message = F, warning = F, cache = T, fig.retina = 3) library(tidyverse) library(flipbookr) library(xaringanthemer) xaringanthemer::mono_light( base_color = "#02075D", # header_font_google = google_font("Josefin Sans"), # text_font_google = google_font("Montserrat", "200", "200i"), # code_font_google = google_font("Droid Mono"), text_font_size = ".85cm", code_font_size = ".15cm") theme_set(theme_gray(base_size = 20)) ```
class: inverse background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover
knitr::opts_chunk$set(echo = T, comment = "", message = F, warning = F, cache = T, fig.retina = 3) library(tidyverse) library(flipbookr) library(xaringanthemer) xaringanthemer::mono_light( base_color = "#02075D", # header_font_google = google_font("Josefin Sans"), # text_font_google = google_font("Montserrat", "200", "200i"), # code_font_google = google_font("Droid Mono"), text_font_size = ".85cm", code_font_size = ".15cm") theme_set(theme_gray(base_size = 20))
class: inverse
![](https://www2.stat.duke.edu/courses/Spring20/sta199.002/slides/img/02a/grammar-of-graphics.png)
class: inverse
class: inverse, center middle # If visualization packages used this approach they would be able to "draw *every* statistical graphic". ??? The grammar of graphics framework, proposed by Leland Wilkinson in 1999, that identified 'seven orthogonal components' in the creation of data visualizations.\ Wilkinson asserted that if data visualization packages were created using a separation of concerns approach -- dividing decision making surrounding these components --- the packages would be able to "draw every statistical graphic". The grammar of graphic principles were incredibly powerful and gave rise to a number of visualization platforms including Tableau, vega-lite, and ggplot2.
class: inverse, center middle
???
The grammar of graphics framework, proposed by Leland Wilkinson in 1999, that identified 'seven orthogonal components' in the creation of data visualizations.\ Wilkinson asserted that if data visualization packages were created using a separation of concerns approach -- dividing decision making surrounding these components --- the packages would be able to "draw every statistical graphic". The grammar of graphic principles were incredibly powerful and gave rise to a number of visualization platforms including Tableau, vega-lite, and ggplot2.
class: inverse
# Creators enjoy freedom, fine control -- # Grammar of graphics inspired: -- #- Tableau #- vega-lite #- ggplot2
--
--
class: inverse
### Use ggplot2 in the classroom because R, and ... -- ![](https://pkg.garrickadenbuie.com/gentle-ggplot2/images/hadley.jpg) > The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a **powerful way of thinking about visualisation**, as a way of mapping between variables and the visual properties of geometric objects that you can perceive. ??? Statistical educators that introduce students to one of these tools arguably are doing more than constructing one-off plots to discuss statistical principles with students: they are introducing students to 'a powerful way of thinking about data visualization'. Statistical educators often use ggplot2 as their grammar-of-graphics-based data visualization tool as students can learn it along side the rich statistical ecosystem of the R programming language. The R programming language thus may serve as a one-stop-shop for statistical tooling; with recent developments in packages and IDEs writing code is becoming more accessible and welcoming to newcomers.
--
The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.
???
Statistical educators that introduce students to one of these tools arguably are doing more than constructing one-off plots to discuss statistical principles with students: they are introducing students to 'a powerful way of thinking about data visualization'.
Statistical educators often use ggplot2 as their grammar-of-graphics-based data visualization tool as students can learn it along side the rich statistical ecosystem of the R programming language. The R programming language thus may serve as a one-stop-shop for statistical tooling; with recent developments in packages and IDEs writing code is becoming more accessible and welcoming to newcomers.
class: inverse
??? Still, using ggplot2 for statistical education can be a challenge at times. When it is used to discuss statistical concepts, sometimes it feels as though getting something done with the plotting library derails the focus on discussion of statistical concepts and material.
???
Still, using ggplot2 for statistical education can be a challenge at times. When it is used to discuss statistical concepts, sometimes it feels as though getting something done with the plotting library derails the focus on discussion of statistical concepts and material.
class: inverse
class: inverse, center, middle # The problem -- ## sometimes ggplot2 is hard
class: inverse, center, middle
--
class: inverse
## The status quo: adding the mean ```r library(ggplot2) ggplot(airquality) + aes(x = Ozone) + geom_histogram() + ggxmean::geom_x_mean() ``` ??? Consider for example, a the seemingly simple enterprise of adding a vertical line at the mean of x, perhaps atop a histogram or density plot.
library(ggplot2) ggplot(airquality) + aes(x = Ozone) + geom_histogram() + ggxmean::geom_x_mean()
???
Consider for example, a the seemingly simple enterprise of adding a vertical line at the mean of x, perhaps atop a histogram or density plot.
class: inverse
`r chunk_reveal("basic", title = "### Adding the mean at x w/ base ggplot2")` ```r ggplot(data = airquality) + aes(x = Ozone) + geom_histogram() + geom_vline( xintercept = mean(airquality$Ozone, na.rm = T) ) ``` ??? Creating this plot requires greater focus on ggplot2 *syntax*, likely detracting from discussion of *the mean* that statistical instructors desire. It may require a discussion about dollar sign syntax and how geom_vline is actually a special geom -- an annotation -- rather than being mapped to the data. None of this is relevant to the point you as an instructor aim to make: maybe that the the mean is the balancing point of the data or maybe a comment about skewness.
r chunk_reveal("basic", title = "### Adding the mean at x w/ base ggplot2")
ggplot(data = airquality) + aes(x = Ozone) + geom_histogram() + geom_vline( xintercept = mean(airquality$Ozone, na.rm = T) )
???
Creating this plot requires greater focus on ggplot2 syntax, likely detracting from discussion of the mean that statistical instructors desire.
It may require a discussion about dollar sign syntax and how geom_vline is actually a special geom -- an annotation -- rather than being mapped to the data. None of this is relevant to the point you as an instructor aim to make: maybe that the the mean is the balancing point of the data or maybe a comment about skewness.
class: inverse
### Adding the conditional means ```r airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month ggplot(airquality) + aes(x = Ozone) + geom_histogram() + facet_grid(rows = vars(Month)) + geom_vline(data = airquality_by_month, aes(xintercept = Ozone_mean)) ``` ??? Further, for the case of adding a vertical line at the mean for different subsets of the data, a different approach is required. This enterprise may take instructor/analyst/student on an even larger detour -- possibly googling, and maybe landing on the following stack overflow page where 11,000 analytics souls (some repeats to be sure) have landed: <https://stackoverflow.com/questions/1644661/add-a-vertical-line-with-different-intercept-for-each-panel-in-ggplot2> The solutions to this problem involve data manipulation *prior* to plotting the data. The solution disrupts the forward flow ggplot build. One must take a pause, which may involve toggling back and forth between stack overflow solutions, disrupting momentum you are working on to talk about the pooled mean and the conditional mean.
airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month ggplot(airquality) + aes(x = Ozone) + geom_histogram() + facet_grid(rows = vars(Month)) + geom_vline(data = airquality_by_month, aes(xintercept = Ozone_mean))
???
Further, for the case of adding a vertical line at the mean for different subsets of the data, a different approach is required. This enterprise may take instructor/analyst/student on an even larger detour -- possibly googling, and maybe landing on the following stack overflow page where 11,000 analytics souls (some repeats to be sure) have landed:
The solutions to this problem involve data manipulation prior to plotting the data. The solution disrupts the forward flow ggplot build. One must take a pause, which may involve toggling back and forth between stack overflow solutions, disrupting momentum you are working on to talk about the pooled mean and the conditional mean.
class: inverse
`r chunk_reveal("cond_means_hard", title = "#### Conditional means (may require a trip to stackoverflow!)")`
r chunk_reveal("cond_means_hard", title = "#### Conditional means (may require a trip to stackoverflow!)")
class: inverse
# "...mastery of advanced programming skills should not be allowed to crowd out data analysis skills or statistical thinking." -- GAISE report -- ![](https://media.giphy.com/media/ycBW5N5XMfpXTvamtF/giphy.gif)
--
class: inverse
class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover # But other statistical geometries in ggplot2 flow! -- ... ggplot is known for being able to 'speak your plot into existence'.
class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover
-- ... ggplot is known for being able to 'speak your plot into existence'.
class: inverse
# Declarative -- ## "Let there be a a box plot..." -- ## and there was a boxplot. -- ## "Let there be LOESS smoothing..." -- ## and there was loess smoothing
--
--
--
--
class: inverse
`r chunk_reveal("base_gg_flow")` ```r ggplot(cars) + aes(x = speed) + aes(y = dist) + geom_point() + geom_smooth() -> g1 library(palmerpenguins) ggplot(penguins) + aes(x = species) + aes(y = bill_length_mm) + geom_point() + geom_boxplot() ```
r chunk_reveal("base_gg_flow")
ggplot(cars) + aes(x = speed) + aes(y = dist) + geom_point() + geom_smooth() -> g1 library(palmerpenguins) ggplot(penguins) + aes(x = species) + aes(y = bill_length_mm) + geom_point() + geom_boxplot()
class: inverse
# But, infrastructure to *extend* ggplot2 ... -- -- ![](https://media.giphy.com/media/relSRdIMtKVcPrSfyT/giphy.gif)
--
--
class: inverse
## Build new statistical geometries that deliver the *flow* experience! -- - # for students -- - # for instructors -- - # for analysts
-- - # for students --
class: inverse
class: center, inverse, middle # Introducing {ggxmean} -- https://github.com/EvaMaeRey/ggxmean
class: center, inverse, middle
--
https://github.com/EvaMaeRey/ggxmean
class: inverse
`r chunk_reveal("geom_x_mean", title = "geom_x_mean()")` ```r library(ggplot2) library(ggxmean) ggplot(airquality) + aes(x = Ozone) + geom_histogram() + geom_x_mean() + facet_grid(rows = vars(Month)) ```
r chunk_reveal("geom_x_mean", title = "geom_x_mean()")
library(ggplot2) library(ggxmean) ggplot(airquality) + aes(x = Ozone) + geom_histogram() + geom_x_mean() + facet_grid(rows = vars(Month))
class: inverse
`r chunk_reveal("univariate", title = "### Other univariate markers") ` ```r ggplot(airquality) + aes(x = Ozone) + geom_histogram() + geom_x_median() + geom_x_quantile( quantile = .25, linetype = "dashed" ) + geom_x_percentile( percentile = 100, linetype = "dotted" ) ```
r chunk_reveal("univariate", title = "### Other univariate markers")
ggplot(airquality) + aes(x = Ozone) + geom_histogram() + geom_x_median() + geom_x_quantile( quantile = .25, linetype = "dashed" ) + geom_x_percentile( percentile = 100, linetype = "dotted" )
class: inverse
`r chunk_reveal("lm_sequence", title = "### Relevant to OLS", widths = c(1,1)) ` ```r ggplot(data = cars) + aes(speed, dist) + geom_point() + #BREAK ggxmean::geom_lm() + #BREAK ggxmean::geom_lm_fitted(color = "blue", size = 3) + #BREAK ggxmean::geom_lm_residuals() + #BREAK ggxmean::geom_lm_conf_int() + #BREAK ggxmean::geom_lm_intercept(color = "red", size = 5) + #BREAK ggxmean::geom_lm_formula(size = 10) #BREAK ```
r chunk_reveal("lm_sequence", title = "### Relevant to OLS", widths = c(1,1))
ggplot(data = cars) + aes(speed, dist) + geom_point() + #BREAK ggxmean::geom_lm() + #BREAK ggxmean::geom_lm_fitted(color = "blue", size = 3) + #BREAK ggxmean::geom_lm_residuals() + #BREAK ggxmean::geom_lm_conf_int() + #BREAK ggxmean::geom_lm_intercept(color = "red", size = 5) + #BREAK ggxmean::geom_lm_formula(size = 10) #BREAK
class: inverse
`r chunk_reveal("geom_normal_dist", title = "### fitting distributions", widths = c(1,1)) ` ```r ggplot(data = faithful) + aes(waiting) + geom_rug() + geom_histogram( aes(y = ..density..)) + #BREAK geom_normal_dist(fill = "blue") + #BREAK facet_grid(rows = vars(eruptions > 3)) #BREAK ```
r chunk_reveal("geom_normal_dist", title = "### fitting distributions", widths = c(1,1))
ggplot(data = faithful) + aes(waiting) + geom_rug() + geom_histogram( aes(y = ..density..)) + #BREAK geom_normal_dist(fill = "blue") + #BREAK facet_grid(rows = vars(eruptions > 3)) #BREAK
class: inverse
class: inverse
class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover # Thank you! -- ## {ggxmean} development package: https://evamaerey.github.io/ggxmean/ -- # Feedback welcome!
class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover
--
--
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.