knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE
)
library(dplyr)
library(magrittr)

First things first

We will need the package ggplot2 :

library(ggplot2)

We also need the "fruits" data:

data("fruits", package = "ReMUSE")

Choose your graph! {.centered}

From Data to Viz : https://www.data-to-viz.com/{width=50%}

A sample of ggplots {.center}

{width=100%}

Bar Plot

Reminder : the barplot function

The base function for bar plots is barplot :

barplot(table(fruits$groupe))

With colors:

barplot(table(fruits$groupe), col = 1:4)

The geom_bar "function"

ggplot(data = fruits, aes(x = groupe, fill = groupe)) +
  geom_bar()

STOP !

Break down the command

Data {.center}

{width=100%}

Aesthetic parameters {.center}

{width=100%}

Geometries {.center}

{width=100%}

What you need to remember {.center}

{width=80%}

[G]rammar of [G]raphics {.center}

{width=100%}

Implementation in ggplot2


Data data The data used to create the graph. Each line represents an object to add to the graph. Geometry geom_ How to represent the objects: point, lines, surfaces etc. Aesthetics aes() Aesthetic parameters of the shapes: position, color, shape, size etc. Scale scale_ Functions used to parameter how the shapes are created from the objects and the aesthetic parameters. For example the function scale_color_manual allows the users to pick their own colors.


Your turn! {.columns-2}

Reproduce the graph on the right:

 ggplot(***, 
     aes(***, 
     fill = Sucres > 10)) +
   geom_***()

{width=100%}

A little bit of history {.columns-2}

Hadley Wickham{width=50%}

Some geometries

We are going to see together some particular geometries used to create "classic" graphs.


geom_bar Bar plot on non-aggregated data geom_col Bar plot on existing counts geom_histogram Histogram of a quantitative variable geom_boxplot Tukey diagram aka boxplot geom_violin "Violin" plot geom_point Scatter plot geom_line Line plot


Bar plots

With geom_bar

We already know how to do it:

ggplot(fruits, aes(cut(Eau, c(0, 84.2, 100)))) + 
  geom_bar(fill = "steelblue")

With geom_col

When you already have counts.

dat.count <- data.frame(
  Fruit = c("Ananas", "Durian"),
  Nb = c(10, 20)
)

ggplot(data = dat.count, aes(x = Fruit, y = Nb)) +
  geom_col()

Your turn

Add colors to the previous bar plot!

Histograms

Histogram or bar plot? {.columns-2 .smaller}

ggplot(fruits, aes(Sucres)) + 
  geom_bar()
ggplot(fruits, aes(Sucres)) + 
  geom_histogram()

Histogram or bar plot? {.columns-2 .smaller}

Bar plot

To plot counts for :

Histogram

To plot counts or densities for:

In this case, it is very important to choose the intervals!

Default histogram {.columns-2 .smaller}

{width=80%}

ggplot(fruits, aes(Sucres)) + 
  geom_histogram()

What does the message mean?

To create a histogram, one needs to distribute values into classes.

Modify the intervals

ggplot(fruits, aes(Sucres)) + 
  geom_histogram(breaks = seq(0, 75, 5))
ggplot(fruits, aes(Sucres)) + 
  geom_histogram(breaks = seq(0, 75, 5))

Change the color

ggplot(fruits, aes(Sucres)) + 
  geom_histogram(breaks = seq(0, 75, 5),
                 fill = "steelblue")
ggplot(fruits, aes(Sucres)) + 
  geom_histogram(breaks = seq(0, 75, 5),
                 fill = "steelblue")

Change the color

ggplot(fruits, aes(Sucres)) + 
  geom_histogram(breaks = seq(0, 75, 5),
                 fill = "steelblue",
                 color = "white")
ggplot(fruits, aes(Sucres)) + 
  geom_histogram(breaks = seq(0, 75, 5),
                 fill = "steelblue",
                 color = "white")

Boxplot

{width=100%}

Boxplot

ggplot(data=fruits, aes(x = Sucres)) + 
  geom_boxplot()

Boxplot : link between a categorical variable and a quantitative variable

ggplot(data=fruits, aes(x=groupe, y=Sucres)) + 
  geom_boxplot()

Violins {.columns-2 .smaller}

ggplot(data=fruits, 
       aes(x = Sucres, y = 1)) + 
  geom_violin()

ggplot(data=fruits, 
       aes(x = groupe, y = Sucres)) + 
  geom_violin()

Your turn! {.columns-2}

Complete the code to obtain the graph on the right:

ggplot(fruits, 
       aes(x = Fibres > 1.5, 
           y = Proteines, 
           fill = ***)) + 
  geom_***()

{width=100%}

Customization

Themes

Themes are pre-defined functions that change the appearance of ggplots:

Examples (theme_***()) :

Example on a histogram : theme_bw()

ggplot(fruits, aes(Fibres)) + 
  geom_histogram() + 
  theme_bw()

Example on a histogram : theme_minimal()

ggplot(fruits, aes(Fibres)) + 
  geom_histogram() + 
  theme_minimal()

Example on a histogram : theme_void()

ggplot(fruits, aes(Fibres)) + 
  geom_histogram() + 
  theme_void()

Your turn! {.columns-2}

  1. Consult the help page for theme_bw with the command ?theme_bw
  2. Choose the appropriate theme to obtain the result on the right.
ggplot(fruits, aes(y = Fibres)) + 
  geom_boxplot() + 
  theme_***()

{width=100%}

Other "simple" customization

... or use the wrapper function labs to go even faster:

labs(
  title = "Titre du graphe",
  subtitle = "Sous-titre du graphe",
  x = "Titre de l'axe des x",
  y = "Titre de l'axe des y",
  color = "Titre de la légende des couleurs",
  shape = "Titre de la légende des formes"
)

Advanced customization

With the function theme(): each element has to be defined according to its nature.

Some of the things one can change with theme()

Scatterplots

With geom_point

This geometry needs $x$ et $y$ aesthetic parameters, and will accept optionally size, color and shape.

ggplot(fruits, aes(x = Phosphore, y = Calcium, size = Magnesium)) + 
  geom_point()

Aesthetic parameters

When they are specified in aes, they apply values (from the dataset) to a characteristic of the objects that are drawn on the graph.

Specified outside of aes(), they behave in a more general way!

Example {.columns-2 .smaller}

ggplot(fruits, 
       aes(x = Phosphore, y = Calcium, 
           color = Magnesium)) + 
  geom_point() + 
  theme(legend.position = "bottom")

ggplot(fruits, 
    aes(x = Phosphore, y = Calcium)) + 
  geom_point(color = "limegreen")

Your turn! {.columns-2 .smaller}

Complete the code to obtain the graph on the right:

ggplot(fruits,
       aes(x = Sucres, 
           y = Proteines, 
           *** = Magnesium, 
           *** = ***)) + 
  geom_***() + 
  ***(title = "Fruits",
     x = "Sucres (g/100 g)", 
     y = "Protéines, N x 6.25 (g/100 g)",
     size = "Magnésium\n(mg/100 g)",
     ***= "Groupe") + 
  theme_***()

{width=100%}

Help, my dots are on top of one another! {.columns-2}

Don't panick, use opacity (aka alpha) :

ggplot(fruits, 
       aes(x = Phosphore, 
           y = Calcium, 
           color = groupe)) + 
  geom_point(alpha = 0.5, 
             size = 2) + 
  theme_bw() + 
  theme(legend.position = 
          "bottom")

ggplot(fruits, 
       aes(x = Phosphore, 
           y = Calcium, 
           color = groupe)) + 
  geom_point(alpha = 0.5, 
             size = 2) + 
  theme_bw() + 
  theme(legend.position = 
          "bottom")

Changing the scales

With the scale_*** functions {.smaller}

They allow the use to customize a scale (in $x$ or $y$ but not only)!

Your turn! {.columns-2}

Complete the code to obtain the graph on the right:

ggplot(fruits, 
       aes(Phosphore, 
           Calcium)) + 
  geom_point(*** = "white") + 
  scale_***() + 
  scale_***() + 
  labs(x = "log10(Phosphore)",
       y = "log10(Calcium)") + 
  theme_dark()

{width=100%}

With the coord_*** functions

They allow the user to change the coordinate system after applying all the scaling transformations (with scale_*** functions). For example:

With the *lim* functions

That allow the users to specify the limits (minimum and maximum) on a specified axis. Caution: the values outside are eliminated from the graph!

To "zoom in" without loosing data, use coord_cartesian or scale_***

"Facetting"

With facet_wrap

Used to divide the graph into panels.

Careful about the syntax: it is based on vars.

To divide a graphe g into several panels according to the value of a factor fac:

g + facet_wrap(facets = vars(fac))

One can also use a "formula" :

g + facet_wrap(~ fac)

Example {.columns-2}

ggplot(fruits, 
       aes(x = Phosphore, 
           y = Calcium, 
           color = groupe)) + 
  geom_point() + 
  facet_wrap(vars(Sucres > 10)) + 
  theme_bw() + 
  theme(legend.position = 
          "bottom")

ggplot(fruits, 
       aes(x = Phosphore, 
           y = Calcium, 
           color = groupe)) + 
  geom_point() + 
  facet_wrap(vars(Sucres > 10)) + 
  theme_linedraw() + 
  theme(legend.position = 
          "bottom")

Or with facet_grid

That is used the same way as facet_wrap.

To divide a graph g into several panels according to the value of a factor factorow for the lines and factocol for the columns:

g + facet_grid(rows = vars(factorow), cols = vars(factocol))

One can also use a "formula":

g + facet_grid(factorow ~ factocol)

A PIECE OF ADVICE: when using faceting, be careful about the levels of the categorical variables that your are going to use.

Save a graph

The easiest method: ggsave

Use and example:

g <- ggplot(fruits, aes(groupe)) + geom_bar()
ggsave(filename = "mongraphe.png", plot = g)

The extension given in filename will be magically used to save the graph in the correct format!

Conclusion

gplot2 is very complete :



vguillemot/ReMUSE documentation built on Dec. 23, 2021, 3:09 p.m.