class: inverse

From source .Rmd:


 title: "Extending ggplot2 statistical geometries"
 subtitle: "MAA Metro New York"  
 author: "Gina Reynolds"
 date: 'Saturday May 1, 2021, 11AM'
 output:
   xaringan::moon_reader:
     lib_dir: libs
     seal: false
     nature:
       ratio: 16:10
       highlightStyle: github
       highlightLines: true
       countIncrementalSlides: false
       beforeInit: "https://platform.twitter.com/widgets.js"

class: inverse

From source .Rmd:


class: inverse
background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80)
background-size: cover

# Extending ggplot2 statistical geometries
## MAA Metro New York
### Dr. Evangeline 'Gina' Reynolds

### Saturday May 1, 2021, 11AM
#### Photo Credit: Mike Lewis HeadSmart Media




```r
 knitr::opts_chunk$set(echo = T, comment = "", message = F, 
                       warning = F, cache = T, fig.retina = 3)
 library(tidyverse)
 library(flipbookr)
 library(xaringanthemer)
 xaringanthemer::mono_light(
   base_color = "#02075D",
   # header_font_google = google_font("Josefin Sans"),
   # text_font_google   = google_font("Montserrat", "200", "200i"),
   # code_font_google   = google_font("Droid Mono"),
   text_font_size = ".85cm",
   code_font_size = ".15cm")
theme_set(theme_gray(base_size = 20))
```

class: inverse background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover

Extending ggplot2 statistical geometries

MAA Metro New York

Dr. Evangeline 'Gina' Reynolds

Saturday May 1, 2021, 11AM

Photo Credit: Mike Lewis HeadSmart Media

 knitr::opts_chunk$set(echo = T, comment = "", message = F, 
                       warning = F, cache = T, fig.retina = 3)
 library(tidyverse)
 library(flipbookr)
 library(xaringanthemer)
 xaringanthemer::mono_light(
   base_color = "#02075D",
   # header_font_google = google_font("Josefin Sans"),
   # text_font_google   = google_font("Montserrat", "200", "200i"),
   # code_font_google   = google_font("Droid Mono"),
   text_font_size = ".85cm",
   code_font_size = ".15cm")
theme_set(theme_gray(base_size = 20))

class: inverse

From source .Rmd:


![](https://www2.stat.duke.edu/courses/Spring20/sta199.002/slides/img/02a/grammar-of-graphics.png)


class: inverse

From source .Rmd:


class: inverse, center middle


# If visualization packages used this approach they would be able to "draw *every* statistical graphic".


???



The grammar of graphics framework, proposed by Leland Wilkinson in 1999, that identified 'seven orthogonal components' in the creation of data visualizations.\
Wilkinson asserted that if data visualization packages were created using a separation of concerns approach -- dividing decision making surrounding these components --- the packages would be able to "draw every statistical graphic". The grammar of graphic principles were incredibly powerful and gave rise to a number of visualization platforms including Tableau, vega-lite, and ggplot2.

class: inverse, center middle

If visualization packages used this approach they would be able to "draw every statistical graphic".

???

The grammar of graphics framework, proposed by Leland Wilkinson in 1999, that identified 'seven orthogonal components' in the creation of data visualizations.\ Wilkinson asserted that if data visualization packages were created using a separation of concerns approach -- dividing decision making surrounding these components --- the packages would be able to "draw every statistical graphic". The grammar of graphic principles were incredibly powerful and gave rise to a number of visualization platforms including Tableau, vega-lite, and ggplot2.


class: inverse

From source .Rmd:


# Creators enjoy freedom, fine control

--

# Grammar of graphics inspired:

--

#- Tableau
#- vega-lite
#- ggplot2

Creators enjoy freedom, fine control

--

Grammar of graphics inspired:

--

- Tableau

- vega-lite

- ggplot2


class: inverse

From source .Rmd:


### Use ggplot2 in the classroom because R, and ...

--

![](https://pkg.garrickadenbuie.com/gentle-ggplot2/images/hadley.jpg)


> The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a **powerful way of thinking about visualisation**, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.


???

Statistical educators that introduce students to one of these tools arguably are doing more than constructing one-off plots to discuss statistical principles with students: they are introducing students to 'a powerful way of thinking about data visualization'.

Statistical educators often use ggplot2 as their grammar-of-graphics-based data visualization tool as students can learn it along side the rich statistical ecosystem of the R programming language. The R programming language thus may serve as a one-stop-shop for statistical tooling; with recent developments in packages and IDEs writing code is becoming more accessible and welcoming to newcomers.

Use ggplot2 in the classroom because R, and ...

--

The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.

???

Statistical educators that introduce students to one of these tools arguably are doing more than constructing one-off plots to discuss statistical principles with students: they are introducing students to 'a powerful way of thinking about data visualization'.

Statistical educators often use ggplot2 as their grammar-of-graphics-based data visualization tool as students can learn it along side the rich statistical ecosystem of the R programming language. The R programming language thus may serve as a one-stop-shop for statistical tooling; with recent developments in packages and IDEs writing code is becoming more accessible and welcoming to newcomers.


class: inverse

From source .Rmd:


???


Still, using ggplot2 for statistical education can be a challenge at times. When it is used to discuss statistical concepts, sometimes it feels as though getting something done with the plotting library derails the focus on discussion of statistical concepts and material.

???

Still, using ggplot2 for statistical education can be a challenge at times. When it is used to discuss statistical concepts, sometimes it feels as though getting something done with the plotting library derails the focus on discussion of statistical concepts and material.


class: inverse

From source .Rmd:


class: inverse, center, middle

# The problem

--

## sometimes ggplot2 is hard

class: inverse, center, middle

The problem

--

sometimes ggplot2 is hard


class: inverse

From source .Rmd:


## The status quo: adding the mean

```r
library(ggplot2)
ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  ggxmean::geom_x_mean()
```


???

Consider for example, a the seemingly simple enterprise of adding a vertical line at the mean of x, perhaps atop a histogram or density plot.

The status quo: adding the mean

library(ggplot2)
ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  ggxmean::geom_x_mean()

???

Consider for example, a the seemingly simple enterprise of adding a vertical line at the mean of x, perhaps atop a histogram or density plot.


class: inverse

From source .Rmd:


`r chunk_reveal("basic", title = "### Adding the mean at x w/ base ggplot2")`

```r
ggplot(data = airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  geom_vline(
    xintercept = 
      mean(airquality$Ozone, 
           na.rm = T)
    )
```


???

Creating this plot requires greater focus on ggplot2 *syntax*, likely detracting from discussion of *the mean* that statistical instructors desire.

It may require a discussion about dollar sign syntax and how geom_vline is actually a special geom -- an annotation -- rather than being mapped to the data. None of this is relevant to the point you as an instructor aim to make: maybe that the the mean is the balancing point of the data or maybe a comment about skewness.

r chunk_reveal("basic", title = "### Adding the mean at x w/ base ggplot2")

ggplot(data = airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  geom_vline(
    xintercept = 
      mean(airquality$Ozone, 
           na.rm = T)
    )

???

Creating this plot requires greater focus on ggplot2 syntax, likely detracting from discussion of the mean that statistical instructors desire.

It may require a discussion about dollar sign syntax and how geom_vline is actually a special geom -- an annotation -- rather than being mapped to the data. None of this is relevant to the point you as an instructor aim to make: maybe that the the mean is the balancing point of the data or maybe a comment about skewness.


class: inverse

From source .Rmd:


### Adding the conditional means

```r
airquality %>% 
  group_by(Month) %>% 
  summarise(
    Ozone_mean = 
      mean(Ozone, na.rm = T)
    ) ->
airquality_by_month

ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  facet_grid(rows = vars(Month)) +
  geom_vline(data = airquality_by_month, 
             aes(xintercept = 
               Ozone_mean))
```


???

Further, for the case of adding a vertical line at the mean for different subsets of the data, a different approach is required. This enterprise may take instructor/analyst/student on an even larger detour -- possibly googling, and maybe landing on the following stack overflow page where 11,000 analytics souls (some repeats to be sure) have landed:

<https://stackoverflow.com/questions/1644661/add-a-vertical-line-with-different-intercept-for-each-panel-in-ggplot2>

The solutions to this problem involve data manipulation *prior* to plotting the data. The solution disrupts the forward flow ggplot build. One must take a pause, which may involve toggling back and forth between stack overflow solutions, disrupting momentum you are working on to talk about the pooled mean and the conditional mean.

Adding the conditional means

airquality %>% 
  group_by(Month) %>% 
  summarise(
    Ozone_mean = 
      mean(Ozone, na.rm = T)
    ) ->
airquality_by_month

ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  facet_grid(rows = vars(Month)) +
  geom_vline(data = airquality_by_month, 
             aes(xintercept = 
               Ozone_mean))

???

Further, for the case of adding a vertical line at the mean for different subsets of the data, a different approach is required. This enterprise may take instructor/analyst/student on an even larger detour -- possibly googling, and maybe landing on the following stack overflow page where 11,000 analytics souls (some repeats to be sure) have landed:

https://stackoverflow.com/questions/1644661/add-a-vertical-line-with-different-intercept-for-each-panel-in-ggplot2

The solutions to this problem involve data manipulation prior to plotting the data. The solution disrupts the forward flow ggplot build. One must take a pause, which may involve toggling back and forth between stack overflow solutions, disrupting momentum you are working on to talk about the pooled mean and the conditional mean.


class: inverse

From source .Rmd:


`r chunk_reveal("cond_means_hard", title = "#### Conditional means (may require a trip to stackoverflow!)")`

r chunk_reveal("cond_means_hard", title = "#### Conditional means (may require a trip to stackoverflow!)")


class: inverse

From source .Rmd:


# "...mastery of advanced programming skills should not be allowed to crowd out data analysis skills or statistical thinking." -- GAISE report

--

![](https://media.giphy.com/media/ycBW5N5XMfpXTvamtF/giphy.gif)

"...mastery of advanced programming skills should not be allowed to crowd out data analysis skills or statistical thinking." -- GAISE report

--


class: inverse

From source .Rmd:


class: inverse, center, bottom
background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80)
background-size: cover

# But other statistical geometries in ggplot2 flow! 
--
... ggplot is known for being able to 'speak your plot into existence'.

class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover

But other statistical geometries in ggplot2 flow!

-- ... ggplot is known for being able to 'speak your plot into existence'.


class: inverse

From source .Rmd:


# Declarative

--

## "Let there be a a box plot..."

--

## and there was a boxplot.

--

## "Let there be LOESS smoothing..."

--

## and there was loess smoothing

Declarative

--

"Let there be a a box plot..."

--

and there was a boxplot.

--

"Let there be LOESS smoothing..."

--

and there was loess smoothing


class: inverse

From source .Rmd:


`r chunk_reveal("base_gg_flow")`

```r
ggplot(cars) + 
  aes(x = speed) + 
  aes(y = dist) +
  geom_point() + 
  geom_smooth() ->
g1

library(palmerpenguins)
ggplot(penguins) +
  aes(x = species) +
  aes(y = bill_length_mm) + 
  geom_point() +
  geom_boxplot()
```

r chunk_reveal("base_gg_flow")

ggplot(cars) + 
  aes(x = speed) + 
  aes(y = dist) +
  geom_point() + 
  geom_smooth() ->
g1

library(palmerpenguins)
ggplot(penguins) +
  aes(x = species) +
  aes(y = bill_length_mm) + 
  geom_point() +
  geom_boxplot()

class: inverse

From source .Rmd:


# But, infrastructure to *extend* ggplot2 ...

--


--

![](https://media.giphy.com/media/relSRdIMtKVcPrSfyT/giphy.gif)

But, infrastructure to extend ggplot2 ...

--

--


class: inverse

From source .Rmd:


## Build new statistical geometries that deliver the *flow* experience!

--
- # for students
--

- # for instructors
--

- # for analysts

Build new statistical geometries that deliver the flow experience!

-- - # for students --

- # for instructors


class: inverse

From source .Rmd:


class: center, inverse, middle

# Introducing {ggxmean}

--

https://github.com/EvaMaeRey/ggxmean

class: center, inverse, middle

Introducing {ggxmean}

--

https://github.com/EvaMaeRey/ggxmean


class: inverse

From source .Rmd:


`r chunk_reveal("geom_x_mean", title = "geom_x_mean()")`

```r
library(ggplot2)
library(ggxmean)
ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  geom_x_mean() + 
  facet_grid(rows = 
               vars(Month))
```

r chunk_reveal("geom_x_mean", title = "geom_x_mean()")

library(ggplot2)
library(ggxmean)
ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  geom_x_mean() + 
  facet_grid(rows = 
               vars(Month))

class: inverse

From source .Rmd:


`r chunk_reveal("univariate", title = "### Other univariate markers") `

```r
ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  geom_x_median() + 
  geom_x_quantile(
    quantile = .25,
    linetype = "dashed"
    ) + 
  geom_x_percentile(
    percentile = 100,
    linetype = "dotted"
    )
```

r chunk_reveal("univariate", title = "### Other univariate markers")

ggplot(airquality) + 
  aes(x = Ozone) + 
  geom_histogram() + 
  geom_x_median() + 
  geom_x_quantile(
    quantile = .25,
    linetype = "dashed"
    ) + 
  geom_x_percentile(
    percentile = 100,
    linetype = "dotted"
    )

class: inverse

From source .Rmd:


`r chunk_reveal("lm_sequence", title = "### Relevant to OLS", widths = c(1,1)) `

```r
ggplot(data = cars) + 
  aes(speed, dist) + 
  geom_point() + #BREAK
  ggxmean::geom_lm() + #BREAK
  ggxmean::geom_lm_fitted(color = "blue",
                          size = 3) + #BREAK
  ggxmean::geom_lm_residuals() + #BREAK
  ggxmean::geom_lm_conf_int() + #BREAK
  ggxmean::geom_lm_intercept(color = "red",
                             size = 5) + #BREAK
  ggxmean::geom_lm_formula(size = 10) #BREAK
```

r chunk_reveal("lm_sequence", title = "### Relevant to OLS", widths = c(1,1))

ggplot(data = cars) + 
  aes(speed, dist) + 
  geom_point() + #BREAK
  ggxmean::geom_lm() + #BREAK
  ggxmean::geom_lm_fitted(color = "blue",
                          size = 3) + #BREAK
  ggxmean::geom_lm_residuals() + #BREAK
  ggxmean::geom_lm_conf_int() + #BREAK
  ggxmean::geom_lm_intercept(color = "red",
                             size = 5) + #BREAK
  ggxmean::geom_lm_formula(size = 10) #BREAK

class: inverse

From source .Rmd:


`r chunk_reveal("geom_normal_dist", title = "### fitting distributions", widths = c(1,1)) `

```r
ggplot(data = faithful) + 
  aes(waiting) + 
  geom_rug() + 
  geom_histogram(
    aes(y = ..density..)) + #BREAK
  geom_normal_dist(fill = "blue") + #BREAK
  facet_grid(rows = 
               vars(eruptions > 3)) #BREAK
```

r chunk_reveal("geom_normal_dist", title = "### fitting distributions", widths = c(1,1))

ggplot(data = faithful) + 
  aes(waiting) + 
  geom_rug() + 
  geom_histogram(
    aes(y = ..density..)) + #BREAK
  geom_normal_dist(fill = "blue") + #BREAK
  facet_grid(rows = 
               vars(eruptions > 3)) #BREAK

class: inverse

From source .Rmd:





class: inverse

From source .Rmd:


class: inverse, center, bottom
background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80)
background-size: cover


# Thank you!

--

## {ggxmean} development package: https://evamaerey.github.io/ggxmean/

--

# Feedback welcome!

class: inverse, center, bottom background-image: url(https://images.unsplash.com/photo-1482685945432-29a7abf2f466?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1532&q=80) background-size: cover

Thank you!

--

{ggxmean} development package: https://evamaerey.github.io/ggxmean/

--

Feedback welcome!



EvaMaeRey/ggxmean documentation built on April 10, 2024, 6:32 p.m.