class: inverse background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover
knitr::opts_chunk$set(echo = F, comment = "", message = F, warning = F, cache = T, fig.retina = 3) library(tidyverse) library(flipbookr) library(xaringanthemer) xaringanthemer::mono_light( base_color = "#02075D", # header_font_google = google_font("Josefin Sans"), # text_font_google = google_font("Montserrat", "200", "200i"), # code_font_google = google_font("Droid Mono"), text_font_size = ".85cm", code_font_size = ".15cm") theme_set(theme_gray(base_size = 20))
class: inverse, center middle
???
The grammar of graphics framework, proposed by Leland Wilkinson in 1999, that identified 'seven orthogonal components' in the creation of data visualizations.\ Wilkinson asserted that if data visualization packages were created using a separation of concerns approach -- dividing decision making surrounding these components --- the packages would be able to "draw every statistical graphic". The grammar of graphic principles were incredibly powerful and gave rise to a number of visualization platforms including Tableau, vega-lite, and ggplot2.
--
--
--
The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.
???
Statistical educators that introduce students to one of these tools arguably are doing more than constructing one-off plots to discuss statistical principles with students: they are introducing students to 'a powerful way of thinking about data visualization'.
Statistical educators often use ggplot2 as their grammar-of-graphics-based data visualization tool as students can learn it along side the rich statistical ecosystem of the R programming language. The R programming language thus may serve as a one-stop-shop for statistical tooling; with recent developments in packages and IDEs writing code is becoming more accessible and welcoming to newcomers.
???
Still, using ggplot2 for statistical education can be a challenge at times. When it is used to discuss statistical concepts, sometimes it feels as though getting something done with the plotting library derails the focus on discussion of statistical concepts and material.
class: inverse, center, middle
--
library(ggplot2) ggplot(airquality) + aes(x = Ozone) + geom_histogram() + ggxmean::geom_x_mean()
???
Consider for example, a the seemingly simple enterprise of adding a vertical line at the mean of x, perhaps atop a histogram or density plot.
r chunk_reveal("basic", title = "### Adding the mean at x w/ base ggplot2")
ggplot(data = airquality) + aes(x = Ozone) + geom_histogram() + geom_vline( xintercept = mean(airquality$Ozone, na.rm = T) )
???
Creating this plot requires greater focus on ggplot2 syntax, likely detracting from discussion of the mean that statistical instructors desire.
It may require a discussion about dollar sign syntax and how geom_vline is actually a special geom -- an annotation -- rather than being mapped to the data. None of this is relevant to the point you as an instructor aim to make: maybe that the the mean is the balancing point of the data or maybe a comment about skewness.
airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month ggplot(airquality) + aes(x = Ozone) + geom_histogram() + facet_grid(rows = vars(Month)) + geom_vline(data = airquality_by_month, aes(xintercept = Ozone_mean))
???
Further, for the case of adding a vertical line at the mean for different subsets of the data, a different approach is required. This enterprise may take instructor/analyst/student on an even larger detour -- possibly googling, and maybe landing on the following stack overflow page where 11,000 analytics souls (some repeats to be sure) have landed:
The solutions to this problem involve data manipulation prior to plotting the data. The solution disrupts the forward flow ggplot build. One must take a pause, which may involve toggling back and forth between stack overflow solutions, disrupting momentum you are working on to talk about the pooled mean and the conditional mean.
r chunk_reveal("cond_means_hard", title = "#### Conditional means (may require a trip to stackoverflow!)")
--
class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover
-- ... ggplot is known for being able to 'speak your plot into existence'.
--
--
--
--
r chunk_reveal("base_gg_flow")
ggplot(cars) + aes(x = speed) + aes(y = dist) + geom_point() + geom_smooth() -> g1 library(palmerpenguins) ggplot(penguins) + aes(x = species) + aes(y = bill_length_mm) + geom_point() + geom_boxplot()
--
--
-- - # for students --
class: center, inverse, middle
--
https://github.com/EvaMaeRey/ggxmean
r chunk_reveal("geom_x_mean", title = "geom_x_mean()")
library(ggplot2) library(ggxmean) ggplot(airquality) + aes(x = Ozone) + geom_histogram() + geom_x_mean() + facet_grid(rows = vars(Month))
r chunk_reveal("univariate", title = "### Other univariate markers")
ggplot(airquality) + aes(x = Ozone) + geom_histogram() + geom_x_median() + geom_x_quantile( quantile = .25, linetype = "dashed" ) + geom_x_percentile( percentile = 100, linetype = "dotted" )
r chunk_reveal("lm_sequence", title = "### Relevant to OLS", widths = c(1,1))
ggplot(data = cars) + aes(speed, dist) + geom_point() + #BREAK ggxmean::geom_lm() + #BREAK ggxmean::geom_lm_fitted(color = "blue", size = 3) + #BREAK ggxmean::geom_lm_residuals() + #BREAK ggxmean::geom_lm_conf_int() + #BREAK ggxmean::geom_lm_intercept(color = "red", size = 5) + #BREAK ggxmean::geom_lm_formula(size = 10) #BREAK
r chunk_reveal("geom_normal_dist", title = "### fitting distributions", widths = c(1,1))
ggplot(data = faithful) + aes(waiting) + geom_rug() + geom_histogram( aes(y = ..density..)) + #BREAK geom_normal_dist(fill = "blue") + #BREAK facet_grid(rows = vars(eruptions > 3)) #BREAK
class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover
--
--
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.