Communication
In r4ds.tutorials: Tutorials for "R for Data Science"

library(learnr)
library(tutorial.helpers)
library(tidyverse)
library(scales)
library(ggrepel)
library(patchwork)

knitr::opts_chunk$set(echo = FALSE)
options(tutorial.exercise.timelimit = 60, 
        tutorial.storage = "local") 

labels_tib <- tibble(
  start = 1:10,
  end = cumsum(start^2)
)

label_info <- mpg |>
  arrange(desc(displ)) |>
  slice_head(n = 1, by = drv) |>
  mutate(
    drive_type = case_when(
      drv == "f" ~ "front-wheel drive",
      drv == "r" ~ "rear-wheel drive",
      drv == "4" ~ "4-wheel drive"
    )
  ) |>
  select(displ, hwy, drv, drive_type)

potential_outliers <- mpg |>
  filter(hwy > 40 | (hwy > 20 & displ > 5))

trend_text <- "Larger engine sizes tend to have lower fuel economy." |>
  str_wrap(width = 30)

# For Legend layout section

base <- ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class))

random_vals <- tibble(
  random_x = rnorm(10000),
  random_y = rnorm(10000)
)

suv <- mpg |> filter(class == "suv")
compact <- mpg |> filter(class == "compact")

## Layout section

p1 <- ggplot(mpg, aes(x = drv, y = cty, color = drv)) + 
  geom_boxplot(show.legend = FALSE) + 
  labs(title = "Plot 1")

p2 <- ggplot(mpg, aes(x = drv, y = hwy, color = drv)) + 
  geom_boxplot(show.legend = FALSE) + 
  labs(title = "Plot 2")

p3 <- ggplot(mpg, aes(x = cty, color = drv, fill = drv)) + 
  geom_density(alpha = 0.5) + 
  labs(title = "Plot 3")

p4 <- ggplot(mpg, aes(x = hwy, color = drv, fill = drv)) + 
  geom_density(alpha = 0.5) + 
  labs(title = "Plot 4")

p5 <- ggplot(mpg, aes(x = cty, y = hwy, color = drv)) + 
  geom_point(show.legend = FALSE) + 
  facet_wrap(~drv) +
  labs(title = "Plot 5")

Introduction

This tutorial covers Chapter 11: Communication from R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. In this tutorial we will be making use of three packages associated with ggplot2: scales, ggrepel, and patchwork. Key commands include quote() which simply returns its argument and geom_label_repel() which adds text directly to the plot.

Labels

The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels. You add labels with the labs() function. We will create this plot:

p_1 <- mpg |> 
  ggplot(aes(x = displ, 
             y = hwy)) +
    geom_point(aes(color = class)) +
    geom_smooth(se = FALSE) +
    labs(
      x = "Engine displacement (L)",
      y = "Highway fuel economy (mpg)",
      color = "Car type",
      title = "Fuel efficiency generally decreases with engine size",
      subtitle = "Two seaters (sports cars) are an exception because of their light weight",
      caption = "Data from fueleconomy.gov"
    )

p_1

Exercise 1

Load the tidyverse package using the library() function.

library(...)

library(tidyverse)

In previous tutorials, you learned how to use plots as tools for exploration. When you make exploratory plots, you know—even before looking—which variables the plot will display. You made each plot for a purpose, could quickly look at it, and then move on to the next plot. In the course of most analyses, you’ll produce tens or hundreds of plots, most of which are immediately thrown away.

Exercise 2

Load the scales package using the library() function.

library(...)

library(scales)

The scales package is used to override the default breaks, labels, transformations and palettes.

You need to communicate your understanding to others. Your audience will likely not share your background knowledge and will not be deeply invested in the data.

Exercise 3

Load the ggrepel package using library().

library(...)

library(ggrepel)

The ggrepel package will automatically adjust labels so that they don’t overlap. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible.

Exercise 4

Load the patchwork package using library().

library(...)

library(patchwork)

The patchwork package allows you to combine separate plots into the same graphic.

We recommend pairing this tutorial with a good general visualization book. We particularly like The Truthful Art, by Albert Cairo. It doesn’t teach the mechanics of creating visualizations, but instead focuses on what you need to think about in order to create effective graphics.

Exercise 5

Now, let's explore the dataset that we want to create a plot on. Type in mpg and hit "Run Code".

mpg

mpg

The mpg dataset provides fuel economy data from 1999 and 2008 for 38 popular models of cars. The dataset is shipped with ggplot2 package.

Exercise 6

Pipe mpg to ggplot().

mpg |> 
  ...

mpg |> 
  ggplot()

As always, ggplot() alone, without the use of the aes() function as an argument to mapping, produces an empty rectangle.

Exercise 7

Within ggplot(), using aes(), set the x argument to the displ variable and the y argument to the hwy variable

mpg |> 
  ggplot(aes(x = ..., y = ...))

mpg |> 
  ggplot(aes(x = displ, y = hwy))

Exercise 8

Add geom_point() to the pipeline.

... +
  geom_point()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point()

We can finally see some data. Although there is more plotting to do, it is not too early to start thinking about the labels we will be using.

Exercise 9

Within geom_point(), using aes(), set the color argument to the class variable.

... +
  geom_point(aes(color = ...))

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(color = class))

The purpose of a plot title is to summarize the main finding. Avoid titles that just describe what the plot is, e.g., “A scatterplot of engine displacement vs. fuel economy”.

Exercise 10

Add geom_smooth() to the pipeline.

... +
  geom_smooth()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth()

The subtitle adds additional detail in a smaller font beneath the title. If there is one key conclusion which readers should come away with, spell it out in the subtitle.

Exercise 11

Within geom_smooth(), add the se argument and set it to FALSE.

... +
  geom_smooth(se = ...)

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE)

The caption adds text at the bottom right of the plot, often used to describe the source of the data. Any plot you make should have a caption since your readers will always want to know where the data come from.

Exercise 12

Add a title, subtitle, x axis title, y axis title, legend title(color), and a caption by adding labs() to the pipeline.

... +
  labs(
    title = ...,
    subtitle = ...,
    x = ...,
    y = ...,
    color = ...,
    caption = ...
  )

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) + 
  labs(
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    color = "Car type",
    title = "Fuel efficiency generally decreases with engine size",
    subtitle = "Two seaters (sports cars) are an exception because of their light weight",
    caption = "Data from fueleconomy.gov"
  )

Reminder: The graphic should look something like this

p_1

Exercise 13

Let's move on to a new plot. Type in labels_tib and hit "Run Code".

labels_tib

labels_tib

The tibble, labels_tib, is a premade tibble that we will be using to show that it is also possible to use mathematical equations instead of textstrings for the labels if we switch "" for the function quote()

Exercise 14

Pipe labels_tib to the function ggplot().

labels_tib |> 
  ...()

labels_tib |> 
  ggplot()

Exercise 15

Within this function, using aes(), set the x argument to the start variable and set the y argument to the end variable.

labels_tib |> 
  ggplot(aes(x = start, y = ...))

labels_tib |> 
  ggplot(aes(x = start, y = end))

Exercise 16

Add the geom_point() function to the pipeline.

... +
  geom_point()

labels_tib |> 
  ggplot(aes(x = start, y = end)) + 
  geom_point()

Exercise 17

Using labs(), set the x axis title to quote(x[i]).

... +
  labs(
    x = ...
  )

labels_tib |> 
  ggplot(aes(x = start, y = end)) + 
  geom_point() +
  labs(
    x = quote(x[i])
  )

Exercise 18

Within labs(), set the y axis title to quote(sum(x[i] ^ 2, i == 1, n)).

... +
  labs(
    x = quote(x[i]),
    y = ...
  )

labels_tib |> 
  ggplot(aes(x = start, y = end)) + 
  geom_point() +
  labs(
    x = quote(x[i]),
    y = quote(sum(x[i] ^ 2, i == 1, n))
  )

Exercise 19

Type in ?plotmath in the Console and learn more about what kind of syntax you can input into the quote() function. Copy/paste an interesting syntax and a meaning to the textbox below and hit "Submit".

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 6)

Annotations

In addition to labeling major components of your plot, it’s often useful to label individual observations or groups of observations. The first tool you have at your disposal is geom_text(), a function similar to geom_point(), but with an additional aesthetic: label. This makes it possible to add text labels to your plots.

In this section, we will make plots which look like this:

p_2 <- 
  mpg |>   
    ggplot(aes(x = displ, 
               y = hwy, 
               color = drv)) +
      geom_point(alpha = 0.3) +
      geom_smooth(method = "loess", formula = y ~ x, se = FALSE) +
      geom_text(
        data = label_info, 
        mapping = aes(x = displ, 
                      y = hwy, 
                      label = drive_type),
        fontface = "bold", 
        size = 5, 
        hjust = "right", 
        vjust = "bottom"
      ) +
      theme(legend.position = "none")

p_2

Exercise 1

There are two possible sources of labels. First, you might have a tibble that provides labels. Type label_info and hit "Run Code".

label_info

label_info

label_info is a tibble with information about three specific cars from mpg. We can use this tibble to directly label the three groups and replace the legend by placing labels directly on the plot.

Exercise 2

Create a new pipeline. Pipe mpg to the ggplot(). Using aes(), map x to displ and y to hwy. Add geom_point().

mpg |> 
  ggplot(aes(x = ..., 
             y = ...)) +
    ...()

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy)) +
    geom_point()

This is not a bad plot, but it fails to use all the data we have about individual cars.

Exercise 3

Within aes(), map color to drv.

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             ... = ...)) +
    geom_point()

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point()

Using color as another "dimension" allows us to see another pattern. Front-wheel drive cars (f) have smaller engines and get better mileage. But notice how hard this is to read for a new viewer. How are they to know what f means?

Exercise 4

Within geom_point(), add the alpha argument and set it equal to 0.3.

... + 
  geom_point(alpha = ...)

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point(alpha = 0.3)

Using alpha helps, first, to highlight where the data is densest and, second, to lessen the business of the plot, the better to allow space for labels.

Exercise 5

Add geom_smooth() with method set to "loess", formula to y ~ x, and se to FALSE.

... +
  geom_smooth(method = ..., ... = y ~ x, se = ...)

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "loess", formula = y ~ x, se = FALSE)

The lines cover different regions because, for example, there are no front-wheel drive (f) cars with engines much larger than 5 liters.

Exercise 6

This plot is not bad, but reading it would be easier if the labels were included in the graphic itself.

Add geom_text() to the plot. Within geom_text(), add the data argument and set it equal to label_info. Set the mapping argument equal to aes(label = drive_type).

... +
  geom_text(
    data = ...,
    mapping = ...
  )

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "loess", formula = y ~ x, se = FALSE) +
    geom_text(
      data = label_info,
      mapping = aes(label = drive_type)
    )

There are two improvements. First, instead of opaque abbreviations like 4 or f, we are now using words, like "4-wheel drive" and "front-wheel drive" that most viewers will understand. Second, that information is in the plot itself, rather than off to the side in the legend.

Exercise 7

We have control over how our labels look. Within your call to geom_text(), set the fontface argument to "bold" and the size argument to 5.

... + 
  geom_text(
    data = label_info, 
    aes(label = drive_type), 
    fontface = ...,
    ... = 5,
  )

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "loess", formula = y ~ x, se = FALSE) +
    geom_text(
      data = label_info,
      mapping = aes(label = drive_type),
      fontface = "bold",
      size = 5
    )

The fontface and size arguments we can customize the look of the text labels. In this case, these arguments make the labels larger than the rest of the text on the plot and bolded.

Exercise 8

The labels are easier to read, but they are still misplaced. Within geom_text(), set the hjust to "right" and the vjust argument to "bottom".

... +
  geom_text(
      data = label_info,
      mapping = aes(label = drive_type),
      fontface = "bold",
      size = 5, 
      hjust = ..., 
      ... = "bottom"
    )

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "loess", formula = y ~ x, se = FALSE) +
    geom_text(
      data = label_info,
      mapping = aes(label = drive_type),
      fontface = "bold",
      size = 5, 
      hjust = "right", 
      vjust = "bottom"
    )

We use hjust (horizontal justification) and vjust (vertical justification) to control the alignment of the label.

Exercise 9

Add theme() to the pipeline. Within theme(), add the argument legend.position and set it to "none".

... + 
  theme(legend.position = ...)

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "loess", formula = y ~ x, se = FALSE) +
    geom_text(
      data = label_info,
      mapping = aes(label = drive_type),
      fontface = "bold",
      size = 5, 
      hjust = "right", 
      vjust = "bottom"
    ) +
    theme(legend.position = "none")

Exercise 10

The annotated plot we made is hard to read because the labels overlap with each other and with the points. We can use the geom_label_repel() function from the ggrepel package to address both of these issues. This useful package will automatically adjust labels so that they don’t overlap as you can see above.

Change geom_text() to geom_label_repel(). Remove the hjust and vjust arguments. Also, set the nudge_y argument to 2.

... + 
    geom_..._repel(
      data = label_info,
      mapping = aes(label = drive_type),
      fontface = "bold",
      size = 5,
      nudge_y = 2
    ) +
  ...

mpg |>   
  ggplot(aes(x = displ, 
             y = hwy,
             color = drv)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "loess", formula = y ~ x, se = FALSE) +
    geom_label_repel(
      data = label_info,
      mapping = aes(label = drive_type),
      fontface = "bold",
      size = 5,
      nudge_y = 2
    ) +
    theme(legend.position = "none")

You can also use the same idea to highlight certain points on a plot with geom_text_repel() from the ggrepel package.

Exercise 11

Type potential_outliers and hit "Run Code".

potential_outliers

This is a tibble of cars with either hwy > 40 or hwy > 20 & displ > 5. In other words, they are outside of the typical relationship between hwy and displ.

Exercise 12

Before identifying these outliers in the graphic, we need to set up our initial plot. With a new pipeline, pipe mpg to ggplot(). Within this ggplot(), using aes(), map x to displ and y to hwy. Add geom_point() so that the data is displayed.

... |> 
  ggplot(aes(x = ..., y = hwy)) +
    ...()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point()

In addition to geom_text() and geom_label(), you have many other geoms in ggplot2 available to help annotate your plot. For example, you can use geom_hline() and geom_vline() to add reference lines. We often make them thick (linewidth = 2) and white (color = "white"), and draw them underneath the primary data layer. That makes them easy to see, without drawing attention away from the data.

Exercise 13

Add the geom_text_repel() to the pipeline. Within geom_text_repel(), set the data argument equal to potential_outliers and the mapping argument to aes(label = model).

... +
  geom_text_repel(data = ...,
                  mapping = aes(... = model))

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point() +
    geom_text_repel(data = potential_outliers, 
                    mapping = aes(label = model))

Our graphic is combining information from two different data sets: mpg and potential_outliers. ggplot() is, implicitly, assigning mpg as the value of data. geom_text_repel() is, instead, using potential_outliers.

Exercise 14

Using your previous code, add the geom_point() function to the pipeline. Within this function, set data equal to potential_outliers and color equal to "red".

... +
  geom_point(data = ...,
             color = "...")

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point() +
    geom_text_repel(data = potential_outliers, 
                    mapping = aes(label = model)) +
    geom_point(data = potential_outliers,
               color = "red")

You can use geom_segment() with the arrow argument to draw attention to a point with an arrow. Use aesthetics x and y to define the starting location, and xend and yend to define the end location.

Exercise 15

If we want to modify the outlier points further, it is often easier, or even necessary, to add another call to geom_point(). It may seem strange to have three separate calls to geom_point() in a single graphic, but making the graphic look exactly as we want often requires such gymnastics.

Add another call to geom_point(). Set data equal to potential_outliers, color equal to "red", size equal to 3, and shape equal to "circle open".

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point() +
    geom_text_repel(data = potential_outliers, 
                    mapping = aes(label = model)) +
    geom_point(data = potential_outliers,
               color = "red") +
    geom_point(data = potential_outliers,
               color = "red",
               size = 3,
               shape = "circle open")

We added a second layer of large, hollow points to further highlight the labelled points.

Another geom that can help annotate your plot is geom_rect(). You can use geom_rect() to draw a rectangle around points of interest. The boundaries of the rectangle are defined by aesthetics xmin, xmax, ymin, ymax. Alternatively, look into the ggforce package, specifically geom_mark_hull(), which allows you to annotate subsets of points with hulls.

Exercise 16

Another handy function for adding annotations to plots is annotate(). As a rule of thumb, geoms are generally useful for highlighting a subset of the data while annotate() is useful for adding one or few annotation elements to a plot.

Create a new variable called trend_text and set it to "Larger engine sizes tend to have lower fuel economy."

trend_text <- "Larger engine sizes tend to have lower fuel economy."

trend_text <- ...

trend_text <- "Larger engine sizes tend to have lower fuel economy."

To demonstrate using annotate(), let’s create some text to add to our plot. The text is a bit long, so we’ll use stringr::str_wrap() to automatically add line breaks to it given the number of characters you want per line.

Exercise 17

Using your previous code, create a new pipeline and pipe trend_text with the str_wrap() function. Within this function add an argument called width and set it to 30.

trend_text <- ... |> 
  str_wrap(width = ...)

trend_text <- "Larger engine sizes tend to have lower fuel economy." |> 
  str_wrap(width = 30)

Exercise 18

Now let's make use of this annotation within a plot. Create a new pipeline and pipe mpg to the ggplot() function. Within this function map x to displ and y to hwy using aes(). Add the geom_point() function to the pipeline.

mpg |> 
  ggplot(aes(x = ..., y = ...)) +
    geom_point()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point()

Exercise 19

Using your previous code, add the annotate() function to the pipeline. Within this function, set the argument label to trend_text, geom to "label", x to 3.5, and y to 38.

mpg |> 
  ggplot(aes(x = ..., y = ...)) +
  geom_point() +
  annotate(
    label = ...,
    geom = ...,
    x = ..., 
    y = ...
  )

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point() +
    annotate(label = trend_text,
             geom = "label",
             x = 3.5,
             y = 38)

Now, let's align it horizontally using the argument hjust.

Exercise 20

Using your previous code, within your most recent call to the annotate() function, add the arguments hjust and set it to "left" and color and set it to "red".

... +
  annotate(
    geom = ...,
    x = ..., 
    y = ...,
    label = ..., 
    hjust = ...,
    color = ...
  )

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point() +
    annotate(label = trend_text,
             geom = "label",
             x = 3.5,
             y = 38,
             hjust = "left",
             color = "red")

Annotation is a powerful tool for communicating main takeaways and interesting features of your visualizations. The only limit is your imagination (and your patience with positioning annotations to be aesthetically pleasing)!

Scales

In this section, we will learn about how to display plots more appropriately by using scales. The most common use of the scales package is to customize the appearance of axis and legend labels. Use a break_* function to control how breaks are generated from the limits, and a label_* function to control how breaks are turned in to labels.

Exercise 1

When you make a regular plot, *ggplot2 automatically adds scales for you. Let's make a simple plot to demonstrate this.

Pipe mpg to the ggplot() function. Within this function map x to displ and y to hwy within aes(). Finish the pipeline with geom_point() with aes(color = class) inside of it.

mpg |> 
  ggplot(aes(x = ..., y = ...)) +
    geom_point(aes(... = class))

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class))

ggplot2 automatically adds default scales behind the scenes such as scale_x_continuous(), scale_y_continuous(), and scale_color_discrete().

Exercise 2

We can test this previous statement, by adding those functions at the end of our previous pipeline.

Using your previous code, add the scale_x_continuous(), scale_y_continuous(), and the scale_color_discrete() functions to the pipeline.

... +
    scale_x_continuous() +
    ... +
    scale_color_discrete()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class)) +
    scale_x_continuous() +
    scale_y_continuous() +
    scale_color_discrete()

We can see that the two plots are the same. Note the naming scheme for scales: scale_ followed by the name of the aesthetic, then _, then the name of the scale. The default scales are named according to the type of variable they align with: continuous, discrete, datetime, or date.

Exercise 3

Create a new pipeline and pipe mpg to the ggplot() function. Within this function, map x to displ, y to hwy, and color to drv using aes(). Finish with geom_point().

mpg |> 
  ggplot(aes(x = ..., y = ..., color = ...)) +
    ...

mpg |> 
  ggplot(aes(x = displ, y = hwy, color = drv)) +
    geom_point()

There are two primary arguments that affect the appearance of the ticks on the axes and the keys on the legend: breaks and labels. Breaks controls the position of the ticks, or the values associated with the keys.

Exercise 4

Using your previous code, add the scale_y_continuous() function to the pipeline. Within this function add the breaks argument and set it to seq(15, 40, by = 5).

... +
    scale_y_continuous(breaks = ...)

mpg |> 
  ggplot(aes(x = displ, y = hwy, color = drv)) +
    geom_point() +
    scale_y_continuous(breaks = seq(15, 40, by = 5))

Labels controls the text label associated with each tick/key. The most common use of breaks is to override the default choice.

Exercise 5

Using your previous code, add the scale_x_continuous() function to the pipeline. Within this function, add the labels argument and set it to NULL.

... +
    scale_x_continuous(labels = ...)

mpg |> 
  ggplot(aes(x = displ, y = hwy, color = drv)) +
    geom_point() +
    scale_y_continuous(breaks = seq(15, 40, by = 5)) +
    scale_x_continuous(labels = NULL)

You can use labels in the same way (a character vector the same length as breaks), but you can also set it to NULL to suppress the labels altogether. This can be useful for maps, or for publishing plots where you do want to share the absolute numbers.

Exercise 6

Using your previous code, add the scale_color_discrete() function to the pipeline. Within this function, add the labels argument and set it to c("4" = "4-wheel", "f" = "front", "r" = "rear").

... +
    scale_color_discrete(labels = ...)

mpg |> 
  ggplot(aes(x = displ, y = hwy, color = drv)) +
    geom_point() +
    scale_y_continuous(breaks = seq(15, 40, by = 5)) +
    scale_x_continuous(labels = NULL) +
    scale_color_discrete(labels = c("4" = "4-wheel", "f" = "front", "r" = "rear"))

You can also use breaks and labels to control the appearance of legends. For discrete scales for categorical variables, labels can be a named list of the existing levels names and the desired labels for them.

Exercise 7

This time we will be using the scales package to create more efficient axis labels. Create a new pipeline and pipe diamonds to the ggplot() function. Within this function, pipe map x to price and y to cut. Add the geom_boxplot() function to the pipeline. Within this function add the alpha argument and set it to 0.05.

diamonds |> 
  ggplot(aes(x = ..., y = ...)) +
    geom_boxplot()

diamonds |> 
  ggplot(aes(x = price, y = cut)) +
    geom_boxplot(alpha = 0.05)

Using alpha to adjust the transparency of individual data points is almost always a good idea if you have thousands of obervations.

Exercise 8

Using your previous code, add the scale_x_continuous() function. Within this function add the labels argument and set it equal to the scales::label_dollar() function.

... +
    scale_x_continuous(labels = ...)

diamonds |> 
  ggplot(aes(x = price, y = cut)) +
    geom_boxplot(alpha = 0.05) +
    scale_x_continuous(labels = scales::label_dollar())

The labels argument coupled with labeling functions from the scales package is also useful for formatting numbers as currency, percent, etc. The plot shows default labeling with label_dollar(), which adds a dollar sign as well as a thousand separator comma.

If the scales package were already loaded with library(), then it would not have been necessary to use the double colon notation, as in scales::label_dollar(). Just label_dollar() would have produced the same result.

Exercise 9

Type in presidential and hit "Run Code".

presidential

presidential

The dataset, presidential gives the names of each president, the start and end date of their term, and their party, for the 12 US presidents from Eisenhower to Trump.

Exercise 10

Create a new pipeline and pipe presidential to the mutate() function. Within this function, create a new variable id and set it equal to 33 plus row_number(). Add the ggplot() function to the pipeline. Within this function, map x to start and y to id using aes(). Add the geom_point() function to the pipeline

presidential |> 
  mutate(id = ...) |> 
  ggplot(aes(x = ..., ... = id)) +
    geom_point()

presidential |> 
  mutate(id = row_number()) |> 
  ggplot(aes(x = start, y = id)) +
    geom_point()

id is simply the number of the presidency is that row. Eisenhower is the first president, Kennedy the second, and so on.

Exercise 11

Using your previous code, add the geom_segment() function. Within this function, map xend to end and yend to id.

... +
  geom_segment(aes(xend = ..., yend = ...))

presidential |> 
  mutate(id = row_number()) |> 
  ggplot(aes(x = start, y = id)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id))

Another use of breaks is when you have relatively few data points and want to highlight exactly where the observations occur.

Exercise 12

Using your previous code, add the scale_x_date() function. Within this function add the argument name and set it equal to NULL.

... +
  scale_x_date(name = ...)

presidential |> 
  mutate(id = row_number()) |> 
  ggplot(aes(x = start, y = id)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_x_date(name = NULL)

We don't need to explicitly label the x-axis since the context makes it clear that these are years.

Exercise 13

Using your previous code, within your call to the function scale_x_date(), add the argument breaks and set it equal to presidential$start.

... +
  scale_x_date(name = NULL, breaks = ...)

presidential |> 
  mutate(id = row_number()) |> 
  ggplot(aes(x = start, y = id)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_x_date(name = NULL, breaks = presidential$start)

Note that for the breaks argument we pulled out the start variable as a vector with presidential$start because we can’t do an aesthetic mapping for this argument. Also note that the specification of breaks and labels for date and datetime scales is a little different.

Exercise 14

Using your previous code, within your call to the function scale_x_date(), add the argument date_labels and set it equal to "'%y".

... +
      scale_x_date(name = NULL, 
                   breaks = presidential$start,
                   ... = "'%y")

presidential |> 
  mutate(id = row_number()) |> 
  ggplot(aes(x = start, y = id)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_x_date(name = NULL, 
                 breaks = presidential$start,
                 date_labels = "'%y")

The argument date_labels takes a format specification, in the same form as parse_datetime(). The plot shows when each US president started and ended their term.

Legend Layout

In this section we will learn how to adjust the placement of the legend and how to customize the legend of your plot.

Exercise 1

First create a new variable called base. Set base equal to the dataset mpg piped to the ggplot() function. Within the aes() for this function map x to displ and y to hwy. Add geom_point() with aes(color = class)

base <- mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class))

base <- mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class))

To control the overall position of the legend, you need to use a theme() setting. The theme setting legend.position controls where the legend is drawn. The default is legend.position = "right".

Exercise 2

Connect base, which is the ggplot2 object you have created, to theme(legend.position = "left") using a +.

Note that, colloquially, we will often use the verb "pipe" in this context. That is, we "pipe" each component of a plot together, just as we "pipe" data cleaning steps together, even though, in the former, we use a + to connect the statements.

base +
  theme(... = "left")

base +
  theme(legend.position = "left")

The legend's position is now to the left of the plot.

Exercise 3

Using your previous code, within your call to the function theme() change the argument's definition from "left" to "top".

base +
  theme(legend.position = ...)

base +
  theme(legend.position = "top")

We can see that the legend is taking too much space at the top of the plot. So let's adjust the way the legend is displayed.

Exercise 4

Using your previous code, add the guides() function to the pipeline. Within this function, add the color argument and set it equal to guide_legend(nrow = 3).

... +
  guides(color = ...(nrow = ...))

base +
  theme(legend.position = "top") +
  guides(color = guide_legend(nrow = 3))

This fixes our previous problem. If your plot is short and wide, place the legend at the top or bottom, and if it’s tall and narrow, place the legend at the left or right. You can also use legend.position = "none" to suppress the display of the legend altogether.

Replacing a Scale

Instead of just tweaking the details a little, you can instead replace the scale altogether. There are two types of scales you’re mostly likely to want to switch out: continuous position scales and color scales. Fortunately, the same principles apply to all the other aesthetics, so once you’ve mastered position and color, you’ll be able to quickly pick up other scale replacements.

Exercise 1

We will be working with a dataset called diamonds. Type in diamonds and hit "Run Code".

diamonds

diamonds

The dataset diamonds us a dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:

Exercise 2

Create a new pipeline. Pipe diamonds to the ggplot() function. Within this function map x to carat and y to price.

diamonds |> 
  ggplot(aes(x = ..., y = ...))

diamonds |> 
  ggplot(aes(x = carat, y = price))

Exercise 3

Using your previous code, add the geom_bin2d() function to the pipeline.

... +
  geom_bin2d()

diamonds |> 
  ggplot(aes(x = carat, y = price)) +
    geom_bin_2d()

It’s very useful to plot transformations of your variable. For example, it’s easier to see the precise relationship between carat and price if we log transform them.

Exercise 4

Using your previous code, within the ggplot() function, add the log10() function to carat and price.

diamonds |> 
  ggplot(aes(x = log10(...), y = ...)) +
  geom_bin2d()

diamonds |> 
  ggplot(aes(x = log10(carat), y = log10(price))) +
    geom_bin_2d()

However, the disadvantage of this transformation is that the axes are now labelled with the transformed values, making it hard to interpret the plot. Instead of doing the transformation in the aesthetic mapping, we can instead do it with the scale.

Exercise 5

Using your previous code, within the ggplot() function remove the log10() function around carat and price. Then add the function scale_x_log10() to the pipeline.

diamonds |> 
  ggplot(aes(x = ..., y = ...)) +
    geom_bin2d() +
    scale_x_log10()

diamonds |> 
  ggplot(aes(x = carat, y = price)) +
    geom_bin2d() +
    scale_x_log10()

Look at the x-axis labels. Note how they are no longer linearly spaced. The distance from the second labelled point to the third is almost three times bigger than the distance from the first labelled point to the second.

Exercise 6

Using your previous code, add the scale_y_log10() function to the pipeline.

... + 
  scale_y_log10()

diamonds |> 
  ggplot(aes(x = carat, y = price)) +
    geom_bin2d() +
    scale_x_log10() +
    scale_y_log10()

This plot is much easier on the eyes due to the axes being labeled in units which make sense to the viewer.

Exercise 7

Create a new pipeline. Pipe mpg to the ggplot() function. Within this function map x to displ and y to hwy. Add geom_point() function to the pipeline with aes() and color equal to drv.

,,, +
  geom_point(aes(color = ...))

mpg |> 
  ggplot((aes(x = displ, y = hwy))) +
    geom_point(aes(color = drv))

Another scale that is frequently customized is color. The default categorical scale picks colors that are evenly spaced around the color wheel. Useful alternatives are the ColorBrewer scales which have been hand-tuned to work better for people with common types of color blindness.

Exercise 8

Using your previous code, add a scale function, scale_color_brewer() to the pipeline. Within this function, add the argument palette and set it equal to "Set1".

... +
  scale_color_brewer(palette = ...)

mpg |> 
  ggplot((aes(x = displ, y = hwy))) +
    geom_point(aes(color = drv)) +
    scale_color_brewer(palette = "Set1")

Don’t forget simpler techniques for improving accessibility. If there are just a few colors, you can add a redundant shape mapping. This will also help ensure your plot is interpretable in black and white.

Exercise 9

Using your previous code, within your call to the geom_point() function, add to your mapping by setting shape equal to drv.

... +
    geom_point(aes(color = drv, shape = ...)) +
    scale_color_brewer(palette = "Set1")

mpg |> 
  ggplot((aes(x = displ, y = hwy))) +
    geom_point(aes(color = drv, shape = drv)) +
    scale_color_brewer(palette = "Set1")

The ColorBrewer scales are documented online at https://colorbrewer2.org/ and made available in R via the RColorBrewer package, by Erich Neuwirth.

Exercise 10

Create a new pipeline and pipe the dataset presidential to the mutate() function. Within the mutate() function, create a new variable called id and set it equal to 33 plus row_number(). Add the ggplot() function to the pipeline and map x to start, y to id, and color to party. Add geom_point().

presidential |> 
  mutate(id = 33 + ...)

presidential |> 
  mutate(id = 33 + row_number()) |> 
  ggplot(aes(x = start, y = id, color = party)) +
    geom_point()

For continuous color, you can use the built-in scale_color_gradient() or scale_fill_gradient().

Exercise 11

Add geom_segment() to the pipe, use aes(), setting the xend argument set to end and the yend argument set to id.

... +
  geom_segment(aes(xend = ..., yend = ...))

presidential |> 
  mutate(id = 33 + row_number()) |> 
  ggplot(aes(x = start, y = id, color = party)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id))

If you have a diverging scale, you can use scale_color_gradient2(). That allows you to give, for example, positive and negative values different colors. That’s sometimes also useful if you want to distinguish points above or below the mean.

Exercise 12

Using your previous code, add the scale_color_manual() function to the pipeline. Within this function add the argument values and set it to equal c(Republican = "#E81B23", Democratic = "#00AEF3").

... +
  scale_color_manual(values = ...)

presidential |> 
  mutate(id = 33 + row_number()) |> 
  ggplot(aes(x = start, y = id, color = party)) +
    geom_point() +
    geom_segment(aes(xend = end, yend = id)) +
    scale_color_manual(values = c(Republican = "#E81B23", Democratic = "#00AEF3"))

If we wanted to map presidential party to color, we want to use the standard mapping of red for Republicans and blue for Democrats. One approach for assigning these colors is using hex color codes as shown above.

Exercise 13

For our next plot, we will create a plot with the dataset random_vals. Type in random_vals and hit "Run Code".

random_vals

random_vals

Another option is to use the viridis color scales. The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored continuous color schemes that are perceptible to people with various forms of color blindness as well as perceptually uniform in both color and black and white.

Exercise 14

Pipe random_vals to ggplot(). Within ggplot(), map x to random_x and y to random_y. Add the geom_hex() and coord_fixed() to the pipeline.

random_vals |> 
  ggplot(aes(x = ..., ... = random_y)) +
    ... +
    coord_fixed()

random_vals |> 
  ggplot(aes(x = random_x, y = random_y)) +
    geom_hex() +
    coord_fixed()

viridis scales are available as continuous (c), discrete (d), and binned (b) palettes in ggplot2.

Exercise 15

Using your previous code, add the scale_fill_viridis_c() function to the pipeline.

... +
  scale_fill_viridis_c()

random_vals |> 
  ggplot(aes(x = random_x, y = random_y)) +
    geom_hex() +
    coord_fixed() +
    scale_fill_viridis_c()

Note that all color scales come in two varieties: scale_color_() and scale_fill_() for the color and fill aesthetics respectively (the color scales are available in both UK and US spellings).

Exercise 16

Using your previous code, change the scale_fill_viridis_c() to scale_fill_viridis_b().

... +
  scale_fill_viridis_b()

random_vals |> 
  ggplot(aes(x = random_x, y = random_y)) +
    geom_hex() +
    coord_fixed() +
    scale_fill_viridis_b()

As you can see the color gradient looks chunky and unlike the continuous plot before. This is because it is a binned plot on the viridis color scale.

Zooming

There are three ways to control the plot limits:

Adjusting what data are plotted.
Setting the limits in each scale.
Setting xlim and ylim in coord_cartesian().

The last of these is probably what you want to use in most cases because it ensures that any modelling is performed on the entire data set.

Exercise 1

Pipe mpg to ggplot(), using aes(), map x to displ and y to hwy. Add geom_point() to the pipeline. Within your call to geom_point(), using aes(), map color to drv. Add geom_smooth() to the pipeline.

... |> 
  ggplot(aes(... = displ, y = ...)) +
    geom_point(...(color = drv)) +
    geom_smooth()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = drv)) +
    geom_smooth()

This plot shows the relationship between engine size (displ) and fuel efficiency (hwy), colored by type of drive train (drv).

Exercise 2

Using the same pipe, add filter() to the pipe in between the beginning of the pipe and your call to ggplot(). Within filter(), include displ greater than or equal to 5 and displ less than or equal to 6.

mpg |> 
  filter(displ >= ... & displ <= ...) |> 
  ...

mpg |> 
  filter(displ >= 5 & displ <= 6) |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = drv)) +
    geom_smooth()

One way to zoom in on a plot is to decrease the range of the data which is plotted. The "problem" with this approach is that only this data is used in geom's like geom_smooth(). This is probably not what you want. Subsetting the data has affected the x and y scales as well as the smooth curve.

Exercise 3

Remove the filter() line in your pipeline.

There exists a limits argument on individual scales like scale_x_continuous() and scale_y_continuous(). Reducing the limits is equivalent to subsetting the data. To see this, add scale_x_continuous() with limits argument to c(5, 6) and scale_y_continuous() with limits argument to c(10, 25) to the pipeline.

... +
    scale_x_continuous(limits = ...) +
    scale_y_continuous(... = c(10, 25))

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = drv)) +
    geom_smooth() +
    scale_x_continuous(limits = c(5, 6)) +
    scale_y_continuous(limits = c(10, 25))

As you can see, this is another way we can control plot limits which is by setting the limits in each scale. Since this is probably not what we want to do, R insists on generating a warning, reminding us about all the data which we are ignoring.

Exercise 4

Using your previous pipeline, delete scale_x_continuous() and scale_y_continuous().

Add coord_cartesian(), setting the xlim argument to c(5, 6) and the ylim argument to c(10, 25).

... +
    coord_cartesian(xlim = c(5, 6), 
                    ylim = c(10, 25))

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = drv)) +
    geom_smooth() +
    coord_cartesian(xlim = c(5, 6), 
                    ylim = c(10, 25))

To zoom in on a region of the plot, it’s generally best to use coord_cartesian().

On the other hand, setting the limits on individual scales is generally more useful if you want to expand the limits, e.g., to match scales across different plots.

Themes

In this section, we will learn about how to customize the non-data elements of your plot with a theme.

The ggplot2 package includes eight themes, with theme_gray() as the default. Many more are included in add-on packages such as ggthemes (https://jrnold.github.io/ggthemes), by Jeffrey Arnold. You can also create your own themes, if you are trying to match a particular corporate or journal style.

Exercise 1

Pipe mpg to ggplot(). Within aes(), set x = displ and y = hwy. Follow with geom_point(aes(color = class)). Last line is geom_smooth(se = FALSE).

... |> 
  ggplot(aes(x = ..., ... = hwy)) +
    geom_point(...(color = class)) +
    geom_smooth(se = ...)

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class)) +
    geom_smooth(se = FALSE)

This plot show the theme_gray() default.

Many people wonder why the default theme has a gray background. This was a deliberate choice because it puts the data forward while still making the grid lines visible. The white grid lines are visible (which is important because they significantly aid position judgments), but they have little visual impact and we can easily tune them out. The gray background gives the plot a similar typographic color to the text, ensuring that the graphics fit in with the flow of a document without jumping out with a bright white background. Finally, the gray background creates a continuous field of color which ensures that the plot is perceived as a single visual entity

Exercise 2

Add theme_bw() to the pipeline.

... +
    theme_bw()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class)) +
    geom_smooth(se = FALSE) +
    theme_bw()

This is the classic dark-on-light ggplot2 theme. This theme may work better for presentations displayed with a projector.

Exercise 3

Replace theme_bw() with theme_linedraw().

... +
    theme_linedraw()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class)) +
    geom_smooth(se = FALSE) +
    theme_linedraw()

A theme with only black lines of various widths on white backgrounds, reminiscent of a line drawing. Serves a purpose similar to theme_bw().

Exercise 4

Replace theme_linedraw() with theme_light().

... +
    theme_light()

mpg |> 
  ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class)) +
    geom_smooth(se = FALSE) +
    theme_light()

A theme similar to theme_linedraw() but with light grey lines and axes, to direct more attention towards the data. There are four other themes built into ggplot2.

Exercise 5

Having learned about the ggplot2 themes, we will now create this graphic:

themes_plot <- mpg |> 
  ggplot(aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: https://fueleconomy.gov."
  ) +
  theme(
    legend.position = c(0.6, 0.7),
    legend.direction = "horizontal",
    legend.box.background = element_rect(color = "black"),
    plot.title = element_text(face = "bold"),
    plot.title.position = "plot",
    plot.caption.position = "plot",
    plot.caption = element_text(hjust = 0)
  )

themes_plot

Start a new pipeline with

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point()

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point()

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point()

It’s possible to control individual components of each theme, like the size and color of the font used for the y axis. We’ve already seen that legend.position controls where the legend is drawn. There are many other aspects of the legend that can be customized with theme().

Exercise 6

Using labs(), set title to "Larger engine sizes tend to have lower fuel economy"and caption to "Source: https://fueleconomy.gov".

... +
  labs(title = ...,
       caption = ...)

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: https://fueleconomy.gov."
  )

In the plot which we are trying to create, we need to change the direction of the legend as well as put a black border around it. The theme() function provides us with the ability to make very detailed changes in plot.

Exercise 7

Add theme() to the pipeline. Within theme(), add the legend.position argument and set it equal to c(0.6, 0.7).

... +
  theme(
    legend.position = ...
    )

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: https://fueleconomy.gov."
  ) +
  theme(
    legend.position = c(0.6, 0.7)
  )

The legend.position argument sets the position of legends ("none", "left", "right", "bottom", "top", or a two-element numeric vector).

Exercise 8

Within theme(), add the legend.direction argument and set it equal to "horizontal".

... +
  theme(
    legend.position = c(0.6, 0.7),
    legend.direction = ...
    )

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: https://fueleconomy.gov."
  ) +
  theme(
    legend.position = c(0.6, 0.7),
    legend.direction = "horizontal"
  )

The legend.direction argument sets the layout of items in legends ("horizontal" or "vertical"). Note the change in the legend in this plot relative to the previous one.

Exercise 9

Within theme(), add the legend.box.background argument and set it equal to element_rect(color = "black").

... +
  theme(
    legend.position = ...,
    legend.direction = ...,
    legend.box.background = ...,
    )

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: https://fueleconomy.gov."
  ) +
  theme(
    legend.position = c(0.6, 0.7),
    legend.direction = "horizontal",
    legend.box.background = element_rect(color = "black")
  )

Note that customization of the legend box and plot title elements of the theme are done with element_*() functions. These functions specify the styling of non-data components, e.g., the legend border color is defined in the color argument of element_rect().

Exercise 10

Within theme(), add the plot.title argument and set it equal to element_text(face = "bold").

... +
  theme(
    legend.position = ...,
    legend.direction = ...,
    legend.box.background = ...,
    plot.title = ...
    )

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: https://fueleconomy.gov."
  ) +
  theme(
    legend.position = c(0.6, 0.7),
    legend.direction = "horizontal",
    legend.box.background = element_rect(color = "black"),
    plot.title = element_text(face = "bold")
  )

The plot.title argument changes the plot title (text appearance) (element_text(); inherits from title) left-aligned by default.

Exercise 11

Within theme(), let's make three more additions: Set plot.title.position to "plot", plot.caption.position to "plot", and plot.caption to element_text(hjust = 0).

... +
  theme(
    legend.position = ...,
    legend.direction = ...,
    legend.box.background = ...,
    plot.title = ...,
    plot.title.position = ...,
    plot.caption.position = ..., 
    plot.caption = ...
    )

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: https://fueleconomy.gov."
  ) +
  theme(
    legend.position = c(0.6, 0.7),
    legend.direction = "horizontal",
    legend.box.background = element_rect(color = "black"),
    plot.title = element_text(face = "bold"),
    plot.title.position = "plot",
    plot.caption.position = "plot",
    plot.caption = element_text(hjust = 0)
  )

The theme elements that control the position of the title and the caption are plot.title.position and plot.caption.position, respectively. In the this plot these are set to "plot" to indicate these elements are aligned to the entire plot area, instead of the plot panel (the default).

The plot.caption argument with value element_text(hjust = 0) places the caption flush left. It would otherwise be right-aligned by default.

Reminder: This is what our plot should look like

themes_plot

For an overview of all theme() components, see help with ?theme. The ggplot2 book is also a great place to go for the full details on theming.

Layout

So far we talked about how to create and modify a single plot. What if you have multiple plots you want to lay out in a certain way? The patchwork package allows you to combine separate plots into the same graphic.

Exercise 1

We have created several plots (p1 through p8) for you to use in these exercises. Type p1 and hit "Run Code".

p1

p1

The ggplot2 package provides a strong API for sequentially building up a plot, but does not concern itself with composition of multiple plots. patchwork is a package that expands the API to allow for arbitrarily complex composition of plots by, among others, providing mathematical operators for combining multiple plots.

Exercise 2

Enter p1 on the first line and p2 on the second line. Hit "Run Code".

...
p2

p1
p2

To place two plots next to each other, you can simply add them to each other, if you have loaded the patchwork package. Note that you first need to create the plots and save them as objects (in the following example they’re called p1 and p2). Then, you place them next to each other with +.

Exercise 3

Type p1 + p2 and hit "Run Code".

p1 + ...

p1 + p2

It’s important to note that in the above code chunk we did not use a new function from the patchwork package. Instead, the package added a new functionality to the + operator.

Exercise 4

You can also create complex plot layouts with patchwork. Type (p1 | p3) / p2 and hit "Run Code".

(... | p3) / ...

(p1 | p3) / p2

| places the p1 and p3 next to each other and / moves p2 to the next line.

Exercise 5

Additionally, patchwork allows you to collect legends from multiple plots into one common legend, customize the placement of the legend as well as dimensions of the plots, and add a common title, subtitle, caption, etc. to your plots. Start with (p1 + p2) / (p3 + p4) / p5.

(p1 + p2) ... (p3 + p4) / ...

(p1 + p2) / (p3 + p4) / p5

This creates a collection of plots with two plots in the first row, two in the second, and then a single plot in the third row.

Exercise 6

Modify the code by adding guide_area() to the front of the call.

... / (p1 + p2) / (p3 + p4) / p5

guide_area() / (p1 + p2) / (p3 + p4) / p5

By default plot guides (like legends) will be put on the side as with regular plots, but by adding a guide_area() to the plot you can tell patchwork to place the guides in that area instead.

Exercise 7

None of our plots, by design, include a legend because we want a single legend for the entire diplay. We also want a single title. To add this, we need to use the patchwork function plot_annotation(). Add it to our plot, setting the title argument to "City and highway mileage for cars with different drive trains" and the caption argument to "Source: https://fueleconomy.gov.".

guide_area() / (p1 + p2) / (p3 + p4) / p5 +
     ...(
         title = "City and highway mileage for cars with different drive trains",
         caption = ...
     )

guide_area() / (p1 + p2) / (p3 + p4) / p5 +
  plot_annotation(
      title = "City and highway mileage for cars with different drive trains",
      caption = "Source: https://fueleconomy.gov."
  )

patchwork has several functions which perform similar tasks for a collection of plots which ggplot2 provides for individual plots.

Exercise 8

plot_layout() provides fine-grained control over the details of how the plots are put together. Add plot_layout() with the guides argument set to "collect" and the heights argument set to c(1, 3, 2, 4).

... +
  plot_layout(
    ... = "collect",
    heights = ...
  )

guide_area() / (p1 + p2) / (p3 + p4) / p5 +
  plot_annotation(
      title = "City and highway mileage for cars with different drive trains",
      caption = "Source: https://fueleconomy.gov."
  ) +
  plot_layout(
    guides = "collect",
    heights = c(1, 3, 2, 4)
  )

We have customized the heights of the various components of our patchwork – the guide has a height of 1, the box plots 3, density plots 2, and the faceted scatterplot 4. patchwork divides up the area you have allotted for your plot using this scale and places the components accordingly.

Exercise 9

The top of the collection is still a bit messed up. We can fix this by adding theme(legend.position = "top") as the last line. WARNING: Instead of using a + sign to connect this line to the rest of the code, you need to use an &.

... & 
    theme(legend.position = "top")

guide_area() / (p1 + p2) / (p3 + p4) / p5 +
    plot_annotation(
        title = "City and highway mileage for cars with different drive trains",
        caption = "Source: https://fueleconomy.gov."
    ) +
    plot_layout(
        guides = "collect",
        heights = c(1, 3, 2, 4)
    ) & 
    theme(legend.position = "top")

Note the use of the & operator here instead of the usual +. This is because we’re modifying the theme for the patchwork plot as opposed to the individual ggplots. The legend is placed on top, inside the guide_area().

Summary

This tutorial covered Chapter 11: Communication from R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. In this tutorial we made use of three packages associated with ggplot2: scales, ggrepel, and patchwork. Key commands included quote() which simply returns its argument and geom_label_repel() which adds text directly to the plot.

ggplot2: Elegant Graphics for Data Analysis (3e) by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen is the best source for all the details about making beautiful graphics with ggplot2.

Any scripts or data that you put into this service are public.

r4ds.tutorials documentation built on April 3, 2025, 5:50 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

r4ds.tutorials Tutorials for "R for Data Science"

Communication In r4ds.tutorials: Tutorials for "R for Data Science"

Introduction

Labels

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Exercise 7

Exercise 8

Exercise 9

Exercise 10

Exercise 11

Exercise 12

Exercise 13

Exercise 14

Exercise 15

Exercise 16

Exercise 17

Exercise 18

Exercise 19

Annotations

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Exercise 7

Exercise 8

Exercise 9

Exercise 10

Exercise 11

Exercise 12

Exercise 13

Exercise 14

Exercise 15

Exercise 16

Exercise 17

Exercise 18

Exercise 19

Exercise 20

Scales

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Exercise 7

Exercise 8

Exercise 9

Exercise 10

Exercise 11

Exercise 12

Exercise 13

Exercise 14

Legend Layout

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Replacing a Scale

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Exercise 7

Exercise 8

Exercise 9

Exercise 10

Exercise 11

Exercise 12

Exercise 13

Exercise 14

Exercise 15

r4ds.tutorials
Tutorials for "R for Data Science"

Communication
In r4ds.tutorials: Tutorials for "R for Data Science"