library(learnr) library(tutorial.helpers) library(tidyverse) library(scales) library(ggrepel) library(patchwork) knitr::opts_chunk$set(echo = FALSE) options(tutorial.exercise.timelimit = 60, tutorial.storage = "local") labels_tib <- tibble( start = 1:10, end = cumsum(start^2) ) label_info <- mpg |> arrange(desc(displ)) |> slice_head(n = 1, by = drv) |> mutate( drive_type = case_when( drv == "f" ~ "front-wheel drive", drv == "r" ~ "rear-wheel drive", drv == "4" ~ "4-wheel drive" ) ) |> select(displ, hwy, drv, drive_type) potential_outliers <- mpg |> filter(hwy > 40 | (hwy > 20 & displ > 5)) trend_text <- "Larger engine sizes tend to have lower fuel economy." |> str_wrap(width = 30) # For Legend layout section base <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = class)) random_vals <- tibble( random_x = rnorm(10000), random_y = rnorm(10000) ) suv <- mpg |> filter(class == "suv") compact <- mpg |> filter(class == "compact") ## Layout section p1 <- ggplot(mpg, aes(x = drv, y = cty, color = drv)) + geom_boxplot(show.legend = FALSE) + labs(title = "Plot 1") p2 <- ggplot(mpg, aes(x = drv, y = hwy, color = drv)) + geom_boxplot(show.legend = FALSE) + labs(title = "Plot 2") p3 <- ggplot(mpg, aes(x = cty, color = drv, fill = drv)) + geom_density(alpha = 0.5) + labs(title = "Plot 3") p4 <- ggplot(mpg, aes(x = hwy, color = drv, fill = drv)) + geom_density(alpha = 0.5) + labs(title = "Plot 4") p5 <- ggplot(mpg, aes(x = cty, y = hwy, color = drv)) + geom_point(show.legend = FALSE) + facet_wrap(~drv) + labs(title = "Plot 5")
This tutorial covers Chapter 11: Communication from R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. In this tutorial we will be making use of three packages associated with ggplot2: scales, ggrepel, and patchwork. Key commands include quote()
which simply returns its argument and geom_label_repel()
which adds text directly to the plot.
The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels. You add labels with the labs()
function. We will create this plot:
p_1 <- mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE) + labs( x = "Engine displacement (L)", y = "Highway fuel economy (mpg)", color = "Car type", title = "Fuel efficiency generally decreases with engine size", subtitle = "Two seaters (sports cars) are an exception because of their light weight", caption = "Data from fueleconomy.gov" ) p_1
Load the tidyverse package using the library()
function.
library(...)
library(tidyverse)
In previous tutorials, you learned how to use plots as tools for exploration. When you make exploratory plots, you know—even before looking—which variables the plot will display. You made each plot for a purpose, could quickly look at it, and then move on to the next plot. In the course of most analyses, you’ll produce tens or hundreds of plots, most of which are immediately thrown away.
Load the scales package using the library()
function.
library(...)
library(scales)
The scales package is used to override the default breaks, labels, transformations and palettes.
You need to communicate your understanding to others. Your audience will likely not share your background knowledge and will not be deeply invested in the data.
Load the ggrepel package using library()
.
library(...)
library(ggrepel)
The ggrepel package will automatically adjust labels so that they don’t overlap. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible.
Load the patchwork package using library()
.
library(...)
library(patchwork)
The patchwork package allows you to combine separate plots into the same graphic.
We recommend pairing this tutorial with a good general visualization book. We particularly like The Truthful Art, by Albert Cairo. It doesn’t teach the mechanics of creating visualizations, but instead focuses on what you need to think about in order to create effective graphics.
Now, let's explore the dataset that we want to create a plot on. Type in mpg
and hit "Run Code".
mpg
mpg
The mpg
dataset provides fuel economy data from 1999 and 2008 for 38 popular models of cars. The dataset is shipped with ggplot2 package.
Pipe mpg
to ggplot()
.
mpg |> ...
mpg |> ggplot()
As always, ggplot()
alone, without the use of the aes()
function as an argument to mapping
, produces an empty rectangle.
Within ggplot()
, using aes()
, set the x
argument to the displ
variable and the y
argument to the hwy
variable
mpg |> ggplot(aes(x = ..., y = ...))
mpg |> ggplot(aes(x = displ, y = hwy))
Add geom_point()
to the pipeline.
... + geom_point()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point()
We can finally see some data. Although there is more plotting to do, it is not too early to start thinking about the labels we will be using.
Within geom_point()
, using aes()
, set the color
argument to the class
variable.
... + geom_point(aes(color = ...))
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class))
The purpose of a plot title is to summarize the main finding. Avoid titles that just describe what the plot is, e.g., “A scatterplot of engine displacement vs. fuel economy”.
Add geom_smooth()
to the pipeline.
... + geom_smooth()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth()
The subtitle
adds additional detail in a smaller font beneath the title. If there is one key conclusion which readers should come away with, spell it out in the subtitle.
Within geom_smooth()
, add the se
argument and set it to FALSE
.
... + geom_smooth(se = ...)
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE)
The caption
adds text at the bottom right of the plot, often used to describe the source of the data. Any plot you make should have a caption since your readers will always want to know where the data come from.
Add a title
, subtitle
, x
axis title, y
axis title, legend title(color
), and a caption
by adding labs()
to the pipeline.
... + labs( title = ..., subtitle = ..., x = ..., y = ..., color = ..., caption = ... )
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE) + labs( x = "Engine displacement (L)", y = "Highway fuel economy (mpg)", color = "Car type", title = "Fuel efficiency generally decreases with engine size", subtitle = "Two seaters (sports cars) are an exception because of their light weight", caption = "Data from fueleconomy.gov" )
Reminder: The graphic should look something like this
p_1
Let's move on to a new plot. Type in labels_tib
and hit "Run Code".
labels_tib
labels_tib
The tibble, labels_tib
, is a premade tibble that we will be using to show that it is also possible to use mathematical equations instead of textstrings
for the labels if we switch ""
for the function quote()
Pipe labels_tib
to the function ggplot()
.
labels_tib |> ...()
labels_tib |> ggplot()
Within this function, using aes()
, set the x
argument to the start
variable and set the y
argument to the end
variable.
labels_tib |> ggplot(aes(x = start, y = ...))
labels_tib |> ggplot(aes(x = start, y = end))
Add the geom_point()
function to the pipeline.
... + geom_point()
labels_tib |> ggplot(aes(x = start, y = end)) + geom_point()
Using labs()
, set the x
axis title to quote(x[i])
.
... + labs( x = ... )
labels_tib |> ggplot(aes(x = start, y = end)) + geom_point() + labs( x = quote(x[i]) )
Within labs()
, set the y
axis title to quote(sum(x[i] ^ 2, i == 1, n))
.
... + labs( x = quote(x[i]), y = ... )
labels_tib |> ggplot(aes(x = start, y = end)) + geom_point() + labs( x = quote(x[i]), y = quote(sum(x[i] ^ 2, i == 1, n)) )
Type in ?plotmath
in the Console and learn more about what kind of syntax you can input into the quote()
function. Copy/paste an interesting syntax and a meaning to the textbox below and hit "Submit".
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 6)
In addition to labeling major components of your plot, it’s often useful to label individual observations or groups of observations. The first tool you have at your disposal is geom_text()
, a function similar to geom_point()
, but with an additional aesthetic: label
. This makes it possible to add text labels to your plots.
In this section, we will make plots which look like this:
p_2 <- mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3) + geom_smooth(method = "loess", formula = y ~ x, se = FALSE) + geom_text( data = label_info, mapping = aes(x = displ, y = hwy, label = drive_type), fontface = "bold", size = 5, hjust = "right", vjust = "bottom" ) + theme(legend.position = "none") p_2
There are two possible sources of labels. First, you might have a tibble that provides labels. Type label_info
and hit "Run Code".
label_info
label_info
label_info
is a tibble with information about three specific cars from mpg
. We can use this tibble to directly label the three groups and replace the legend by placing labels directly on the plot.
Create a new pipeline. Pipe mpg
to the ggplot()
. Using aes()
, map x
to displ
and y
to hwy
. Add geom_point()
.
mpg |> ggplot(aes(x = ..., y = ...)) + ...()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point()
This is not a bad plot, but it fails to use all the data we have about individual cars.
Within aes()
, map color
to drv
.
mpg |> ggplot(aes(x = displ, y = hwy, ... = ...)) + geom_point()
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point()
Using color as another "dimension" allows us to see another pattern. Front-wheel drive cars (f
) have smaller engines and get better mileage. But notice how hard this is to read for a new viewer. How are they to know what f
means?
Within geom_point()
, add the alpha
argument and set it equal to 0.3
.
... + geom_point(alpha = ...)
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3)
Using alpha
helps, first, to highlight where the data is densest and, second, to lessen the business of the plot, the better to allow space for labels.
Add geom_smooth()
with method
set to "loess"
, formula
to y ~ x
, and se
to FALSE
.
... + geom_smooth(method = ..., ... = y ~ x, se = ...)
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3) + geom_smooth(method = "loess", formula = y ~ x, se = FALSE)
The lines cover different regions because, for example, there are no front-wheel drive (f
) cars with engines much larger than 5 liters.
This plot is not bad, but reading it would be easier if the labels were included in the graphic itself.
Add geom_text()
to the plot. Within geom_text()
, add the data
argument and set it equal to label_info
. Set the mapping
argument equal to aes(label = drive_type)
.
... + geom_text( data = ..., mapping = ... )
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3) + geom_smooth(method = "loess", formula = y ~ x, se = FALSE) + geom_text( data = label_info, mapping = aes(label = drive_type) )
There are two improvements. First, instead of opaque abbreviations like 4
or f
, we are now using words, like "4-wheel drive" and "front-wheel drive" that most viewers will understand. Second, that information is in the plot itself, rather than off to the side in the legend.
We have control over how our labels look. Within your call to geom_text()
, set the fontface
argument to "bold"
and the size
argument to 5
.
... + geom_text( data = label_info, aes(label = drive_type), fontface = ..., ... = 5, )
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3) + geom_smooth(method = "loess", formula = y ~ x, se = FALSE) + geom_text( data = label_info, mapping = aes(label = drive_type), fontface = "bold", size = 5 )
The fontface
and size
arguments we can customize the look of the text labels. In this case, these arguments make the labels larger than the rest of the text on the plot and bolded.
The labels are easier to read, but they are still misplaced. Within geom_text()
, set the hjust
to "right"
and the vjust
argument to "bottom"
.
... + geom_text( data = label_info, mapping = aes(label = drive_type), fontface = "bold", size = 5, hjust = ..., ... = "bottom" )
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3) + geom_smooth(method = "loess", formula = y ~ x, se = FALSE) + geom_text( data = label_info, mapping = aes(label = drive_type), fontface = "bold", size = 5, hjust = "right", vjust = "bottom" )
We use hjust
(horizontal justification) and vjust
(vertical justification) to control the alignment of the label.
Add theme()
to the pipeline. Within theme()
, add the argument legend.position
and set it to "none"
.
... + theme(legend.position = ...)
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3) + geom_smooth(method = "loess", formula = y ~ x, se = FALSE) + geom_text( data = label_info, mapping = aes(label = drive_type), fontface = "bold", size = 5, hjust = "right", vjust = "bottom" ) + theme(legend.position = "none")
The annotated plot we made is hard to read because the labels overlap with each other and with the points. We can use the geom_label_repel()
function from the ggrepel package to address both of these issues. This useful package will automatically adjust labels so that they don’t overlap as you can see above.
Change geom_text()
to geom_label_repel()
. Remove the hjust
and vjust
arguments. Also, set the nudge_y
argument to 2
.
... + geom_..._repel( data = label_info, mapping = aes(label = drive_type), fontface = "bold", size = 5, nudge_y = 2 ) + ...
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.3) + geom_smooth(method = "loess", formula = y ~ x, se = FALSE) + geom_label_repel( data = label_info, mapping = aes(label = drive_type), fontface = "bold", size = 5, nudge_y = 2 ) + theme(legend.position = "none")
You can also use the same idea to highlight certain points on a plot with geom_text_repel()
from the ggrepel package.
Type potential_outliers
and hit "Run Code".
potential_outliers
This is a tibble of cars with either hwy > 40
or hwy > 20 & displ > 5
. In other words, they are outside of the typical relationship between hwy
and displ
.
Before identifying these outliers in the graphic, we need to set up our initial plot. With a new pipeline, pipe mpg
to ggplot()
. Within this ggplot()
, using aes()
, map x
to displ
and y
to hwy
. Add geom_point()
so that the data is displayed.
... |> ggplot(aes(x = ..., y = hwy)) + ...()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point()
In addition to geom_text()
and geom_label()
, you have many other geoms in ggplot2 available to help annotate your plot. For example, you can use geom_hline()
and geom_vline()
to add reference lines. We often make them thick (linewidth = 2
) and white (color = "white"
), and draw them underneath the primary data layer. That makes them easy to see, without drawing attention away from the data.
Add the geom_text_repel()
to the pipeline. Within geom_text_repel()
, set the data
argument equal to potential_outliers
and the mapping
argument to aes(label = model)
.
... + geom_text_repel(data = ..., mapping = aes(... = model))
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point() + geom_text_repel(data = potential_outliers, mapping = aes(label = model))
Our graphic is combining information from two different data sets: mpg
and potential_outliers
. ggplot()
is, implicitly, assigning mpg
as the value of data
. geom_text_repel()
is, instead, using potential_outliers
.
Using your previous code, add the geom_point()
function to the pipeline. Within this function, set data
equal to potential_outliers
and color
equal to "red"
.
... + geom_point(data = ..., color = "...")
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point() + geom_text_repel(data = potential_outliers, mapping = aes(label = model)) + geom_point(data = potential_outliers, color = "red")
You can use geom_segment()
with the arrow argument to draw attention to a point with an arrow. Use aesthetics x
and y
to define the starting location, and xend
and yend
to define the end location.
If we want to modify the outlier points further, it is often easier, or even necessary, to add another call to geom_point()
. It may seem strange to have three separate calls to geom_point()
in a single graphic, but making the graphic look exactly as we want often requires such gymnastics.
Add another call to geom_point()
. Set data
equal to potential_outliers
, color
equal to "red"
, size
equal to 3
, and shape
equal to "circle open"
.
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point() + geom_text_repel(data = potential_outliers, mapping = aes(label = model)) + geom_point(data = potential_outliers, color = "red") + geom_point(data = potential_outliers, color = "red", size = 3, shape = "circle open")
We added a second layer of large, hollow points to further highlight the labelled points.
Another geom that can help annotate your plot is geom_rect()
. You can use geom_rect()
to draw a rectangle around points of interest. The boundaries of the rectangle are defined by aesthetics xmin
, xmax
, ymin
, ymax
. Alternatively, look into the ggforce package, specifically geom_mark_hull()
, which allows you to annotate subsets of points with hulls.
Another handy function for adding annotations to plots is annotate()
. As a rule of thumb, geoms are generally useful for highlighting a subset of the data while annotate()
is useful for adding one or few annotation elements to a plot.
Create a new variable called trend_text
and set it to "Larger engine sizes tend to have lower fuel economy."
trend_text <- "Larger engine sizes tend to have lower fuel economy."
trend_text <- ...
trend_text <- "Larger engine sizes tend to have lower fuel economy."
To demonstrate using annotate()
, let’s create some text to add to our plot. The text is a bit long, so we’ll use stringr::str_wrap()
to automatically add line breaks to it given the number of characters you want per line.
Using your previous code, create a new pipeline and pipe trend_text
with the str_wrap()
function. Within this function add an argument called width
and set it to 30
.
trend_text <- ... |> str_wrap(width = ...)
trend_text <- "Larger engine sizes tend to have lower fuel economy." |> str_wrap(width = 30)
Now let's make use of this annotation within a plot. Create a new pipeline and pipe mpg
to the ggplot()
function. Within this function map x
to displ
and y
to hwy
using aes()
. Add the geom_point()
function to the pipeline.
mpg |> ggplot(aes(x = ..., y = ...)) + geom_point()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point()
Using your previous code, add the annotate()
function to the pipeline. Within this function, set the argument label
to trend_text
, geom
to "label"
, x
to 3.5
, and y
to 38
.
mpg |> ggplot(aes(x = ..., y = ...)) + geom_point() + annotate( label = ..., geom = ..., x = ..., y = ... )
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point() + annotate(label = trend_text, geom = "label", x = 3.5, y = 38)
Now, let's align it horizontally using the argument hjust
.
Using your previous code, within your most recent call to the annotate()
function, add the arguments hjust
and set it to "left"
and color
and set it to "red"
.
... + annotate( geom = ..., x = ..., y = ..., label = ..., hjust = ..., color = ... )
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point() + annotate(label = trend_text, geom = "label", x = 3.5, y = 38, hjust = "left", color = "red")
Annotation is a powerful tool for communicating main takeaways and interesting features of your visualizations. The only limit is your imagination (and your patience with positioning annotations to be aesthetically pleasing)!
In this section, we will learn about how to display plots more appropriately by using scales. The most common use of the scales package is to customize the appearance of axis and legend labels. Use a break_*
function to control how breaks are generated from the limits, and a label_*
function to control how breaks are turned in to labels.
When you make a regular plot, *ggplot2 automatically adds scales for you. Let's make a simple plot to demonstrate this.
Pipe mpg
to the ggplot()
function. Within this function map x
to displ
and y
to hwy
within aes()
. Finish the pipeline with geom_point()
with aes(color = class)
inside of it.
mpg |> ggplot(aes(x = ..., y = ...)) + geom_point(aes(... = class))
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class))
ggplot2 automatically adds default scales behind the scenes such as scale_x_continuous()
, scale_y_continuous()
, and scale_color_discrete()
.
We can test this previous statement, by adding those functions at the end of our previous pipeline.
Using your previous code, add the scale_x_continuous()
, scale_y_continuous()
, and the scale_color_discrete()
functions to the pipeline.
... + scale_x_continuous() + ... + scale_color_discrete()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + scale_x_continuous() + scale_y_continuous() + scale_color_discrete()
We can see that the two plots are the same. Note the naming scheme for scales: scale_
followed by the name of the aesthetic, then _
, then the name of the scale. The default scales are named according to the type of variable they align with: continuous
, discrete
, datetime
, or date
.
Create a new pipeline and pipe mpg
to the ggplot()
function. Within this function, map x
to displ
, y
to hwy
, and color
to drv
using aes()
. Finish with geom_point()
.
mpg |> ggplot(aes(x = ..., y = ..., color = ...)) + ...
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point()
There are two primary arguments that affect the appearance of the ticks on the axes and the keys on the legend: breaks
and labels
. Breaks controls the position of the ticks, or the values associated with the keys.
Using your previous code, add the scale_y_continuous()
function to the pipeline. Within this function add the breaks
argument and set it to seq(15, 40, by = 5)
.
... + scale_y_continuous(breaks = ...)
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point() + scale_y_continuous(breaks = seq(15, 40, by = 5))
Labels controls the text label associated with each tick/key. The most common use of breaks
is to override the default choice.
Using your previous code, add the scale_x_continuous()
function to the pipeline. Within this function, add the labels
argument and set it to NULL
.
... + scale_x_continuous(labels = ...)
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point() + scale_y_continuous(breaks = seq(15, 40, by = 5)) + scale_x_continuous(labels = NULL)
You can use labels in the same way (a character vector the same length as breaks), but you can also set it to NULL to suppress the labels altogether. This can be useful for maps, or for publishing plots where you do want to share the absolute numbers.
Using your previous code, add the scale_color_discrete()
function to the pipeline. Within this function, add the labels
argument and set it to c("4" = "4-wheel", "f" = "front", "r" = "rear")
.
... + scale_color_discrete(labels = ...)
mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point() + scale_y_continuous(breaks = seq(15, 40, by = 5)) + scale_x_continuous(labels = NULL) + scale_color_discrete(labels = c("4" = "4-wheel", "f" = "front", "r" = "rear"))
You can also use breaks and labels to control the appearance of legends. For discrete scales for categorical variables, labels can be a named list of the existing levels names and the desired labels for them.
This time we will be using the scales package to create more efficient axis labels. Create a new pipeline and pipe diamonds
to the ggplot()
function. Within this function, pipe map x
to price
and y
to cut
. Add the geom_boxplot()
function to the pipeline. Within this function add the alpha
argument and set it to 0.05
.
diamonds |> ggplot(aes(x = ..., y = ...)) + geom_boxplot()
diamonds |> ggplot(aes(x = price, y = cut)) + geom_boxplot(alpha = 0.05)
Using alpha
to adjust the transparency of individual data points is almost always a good idea if you have thousands of obervations.
Using your previous code, add the scale_x_continuous()
function. Within this function add the labels
argument and set it equal to the scales::label_dollar()
function.
... + scale_x_continuous(labels = ...)
diamonds |> ggplot(aes(x = price, y = cut)) + geom_boxplot(alpha = 0.05) + scale_x_continuous(labels = scales::label_dollar())
The labels
argument coupled with labeling functions from the scales package is also useful for formatting numbers as currency, percent, etc. The plot shows default labeling with label_dollar()
, which adds a dollar sign as well as a thousand separator comma.
If the scales package were already loaded with library()
, then it would not have been necessary to use the double colon notation, as in scales::label_dollar()
. Just label_dollar()
would have produced the same result.
Type in presidential
and hit "Run Code".
presidential
presidential
The dataset, presidential
gives the names of each president, the start and end date of their term, and their party, for the 12 US presidents from Eisenhower to Trump.
Create a new pipeline and pipe presidential
to the mutate()
function. Within this function, create a new variable id
and set it equal to 33
plus row_number()
. Add the ggplot()
function to the pipeline. Within this function, map x
to start
and y
to id
using aes()
. Add the geom_point()
function to the pipeline
presidential |> mutate(id = ...) |> ggplot(aes(x = ..., ... = id)) + geom_point()
presidential |> mutate(id = row_number()) |> ggplot(aes(x = start, y = id)) + geom_point()
id
is simply the number of the presidency is that row. Eisenhower is the first president, Kennedy the second, and so on.
Using your previous code, add the geom_segment()
function. Within this function, map xend
to end
and yend
to id
.
... + geom_segment(aes(xend = ..., yend = ...))
presidential |> mutate(id = row_number()) |> ggplot(aes(x = start, y = id)) + geom_point() + geom_segment(aes(xend = end, yend = id))
Another use of breaks is when you have relatively few data points and want to highlight exactly where the observations occur.
Using your previous code, add the scale_x_date()
function. Within this function add the argument name
and set it equal to NULL
.
... + scale_x_date(name = ...)
presidential |> mutate(id = row_number()) |> ggplot(aes(x = start, y = id)) + geom_point() + geom_segment(aes(xend = end, yend = id)) + scale_x_date(name = NULL)
We don't need to explicitly label the x-axis since the context makes it clear that these are years.
Using your previous code, within your call to the function scale_x_date()
, add the argument breaks
and set it equal to presidential$start
.
... + scale_x_date(name = NULL, breaks = ...)
presidential |> mutate(id = row_number()) |> ggplot(aes(x = start, y = id)) + geom_point() + geom_segment(aes(xend = end, yend = id)) + scale_x_date(name = NULL, breaks = presidential$start)
Note that for the breaks
argument we pulled out the start variable as a vector with presidential$start
because we can’t do an aesthetic mapping for this argument. Also note that the specification of breaks
and labels
for date and datetime scales is a little different.
Using your previous code, within your call to the function scale_x_date()
, add the argument date_labels
and set it equal to "'%y"
.
... + scale_x_date(name = NULL, breaks = presidential$start, ... = "'%y")
presidential |> mutate(id = row_number()) |> ggplot(aes(x = start, y = id)) + geom_point() + geom_segment(aes(xend = end, yend = id)) + scale_x_date(name = NULL, breaks = presidential$start, date_labels = "'%y")
The argument date_labels
takes a format specification, in the same form as parse_datetime()
. The plot shows when each US president started and ended their term.
In this section we will learn how to adjust the placement of the legend and how to customize the legend of your plot.
First create a new variable called base
. Set base
equal to the dataset mpg
piped to the ggplot()
function. Within the aes()
for this function map x
to displ
and y
to hwy
. Add geom_point()
with aes(color = class)
base <- mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class))
base <- mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class))
To control the overall position of the legend, you need to use a theme()
setting. The theme setting legend.position
controls where the legend is drawn. The default is legend.position = "right"
.
Connect base
, which is the ggplot2 object you have created, to theme(legend.position = "left")
using a +
.
Note that, colloquially, we will often use the verb "pipe" in this context. That is, we "pipe" each component of a plot together, just as we "pipe" data cleaning steps together, even though, in the former, we use a +
to connect the statements.
base + theme(... = "left")
base + theme(legend.position = "left")
The legend's position is now to the left of the plot.
Using your previous code, within your call to the function theme()
change the argument's definition from "left"
to "top"
.
base + theme(legend.position = ...)
base + theme(legend.position = "top")
We can see that the legend is taking too much space at the top of the plot. So let's adjust the way the legend is displayed.
Using your previous code, add the guides()
function to the pipeline. Within this function, add the color
argument and set it equal to guide_legend(nrow = 3)
.
... + guides(color = ...(nrow = ...))
base + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3))
This fixes our previous problem. If your plot is short and wide, place the legend at the top or bottom, and if it’s tall and narrow, place the legend at the left or right. You can also use legend.position = "none"
to suppress the display of the legend altogether.
Instead of just tweaking the details a little, you can instead replace the scale altogether. There are two types of scales you’re mostly likely to want to switch out: continuous position scales and color scales. Fortunately, the same principles apply to all the other aesthetics, so once you’ve mastered position and color, you’ll be able to quickly pick up other scale replacements.
We will be working with a dataset called diamonds
. Type in diamonds
and hit "Run Code".
diamonds
diamonds
The dataset diamonds
us a dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:
Create a new pipeline. Pipe diamonds
to the ggplot()
function. Within this function map x
to carat
and y
to price
.
diamonds |> ggplot(aes(x = ..., y = ...))
diamonds |> ggplot(aes(x = carat, y = price))
Using your previous code, add the geom_bin2d()
function to the pipeline.
... + geom_bin2d()
diamonds |> ggplot(aes(x = carat, y = price)) + geom_bin_2d()
It’s very useful to plot transformations of your variable. For example, it’s easier to see the precise relationship between carat and price if we log transform them.
Using your previous code, within the ggplot()
function, add the log10()
function to carat
and price
.
diamonds |> ggplot(aes(x = log10(...), y = ...)) + geom_bin2d()
diamonds |> ggplot(aes(x = log10(carat), y = log10(price))) + geom_bin_2d()
However, the disadvantage of this transformation is that the axes are now labelled with the transformed values, making it hard to interpret the plot. Instead of doing the transformation in the aesthetic mapping, we can instead do it with the scale.
Using your previous code, within the ggplot()
function remove the log10()
function around carat
and price
. Then add the function scale_x_log10()
to the pipeline.
diamonds |> ggplot(aes(x = ..., y = ...)) + geom_bin2d() + scale_x_log10()
diamonds |> ggplot(aes(x = carat, y = price)) + geom_bin2d() + scale_x_log10()
Look at the x-axis labels. Note how they are no longer linearly spaced. The distance from the second labelled point to the third is almost three times bigger than the distance from the first labelled point to the second.
Using your previous code, add the scale_y_log10()
function to the pipeline.
... + scale_y_log10()
diamonds |> ggplot(aes(x = carat, y = price)) + geom_bin2d() + scale_x_log10() + scale_y_log10()
This plot is much easier on the eyes due to the axes being labeled in units which make sense to the viewer.
Create a new pipeline. Pipe mpg
to the ggplot()
function. Within this function map x
to displ
and y
to hwy
. Add geom_point()
function to the pipeline with aes()
and color
equal to drv
.
,,, + geom_point(aes(color = ...))
mpg |> ggplot((aes(x = displ, y = hwy))) + geom_point(aes(color = drv))
Another scale that is frequently customized is color. The default categorical scale picks colors that are evenly spaced around the color wheel. Useful alternatives are the ColorBrewer scales which have been hand-tuned to work better for people with common types of color blindness.
Using your previous code, add a scale function, scale_color_brewer()
to the pipeline. Within this function, add the argument palette
and set it equal to "Set1"
.
... + scale_color_brewer(palette = ...)
mpg |> ggplot((aes(x = displ, y = hwy))) + geom_point(aes(color = drv)) + scale_color_brewer(palette = "Set1")
Don’t forget simpler techniques for improving accessibility. If there are just a few colors, you can add a redundant shape mapping. This will also help ensure your plot is interpretable in black and white.
Using your previous code, within your call to the geom_point()
function, add to your mapping by setting shape
equal to drv
.
... + geom_point(aes(color = drv, shape = ...)) + scale_color_brewer(palette = "Set1")
mpg |> ggplot((aes(x = displ, y = hwy))) + geom_point(aes(color = drv, shape = drv)) + scale_color_brewer(palette = "Set1")
The ColorBrewer scales are documented online at https://colorbrewer2.org/ and made available in R via the RColorBrewer package, by Erich Neuwirth.
Create a new pipeline and pipe the dataset presidential
to the mutate()
function. Within the mutate()
function, create a new variable called id
and set it equal to 33
plus row_number()
. Add the ggplot()
function to the pipeline and map x
to start
, y
to id
, and color
to party
. Add geom_point()
.
presidential |> mutate(id = 33 + ...)
presidential |> mutate(id = 33 + row_number()) |> ggplot(aes(x = start, y = id, color = party)) + geom_point()
For continuous color, you can use the built-in scale_color_gradient()
or scale_fill_gradient()
.
Add geom_segment()
to the pipe, use aes()
, setting the xend
argument set to end
and the yend
argument set to id
.
... + geom_segment(aes(xend = ..., yend = ...))
presidential |> mutate(id = 33 + row_number()) |> ggplot(aes(x = start, y = id, color = party)) + geom_point() + geom_segment(aes(xend = end, yend = id))
If you have a diverging scale, you can use scale_color_gradient2()
. That allows you to give, for example, positive and negative values different colors. That’s sometimes also useful if you want to distinguish points above or below the mean.
Using your previous code, add the scale_color_manual()
function to the pipeline. Within this function add the argument values
and set it to equal c(Republican = "#E81B23", Democratic = "#00AEF3")
.
... + scale_color_manual(values = ...)
presidential |> mutate(id = 33 + row_number()) |> ggplot(aes(x = start, y = id, color = party)) + geom_point() + geom_segment(aes(xend = end, yend = id)) + scale_color_manual(values = c(Republican = "#E81B23", Democratic = "#00AEF3"))
If we wanted to map presidential party to color, we want to use the standard mapping of red for Republicans and blue for Democrats. One approach for assigning these colors is using hex color codes as shown above.
For our next plot, we will create a plot with the dataset random_vals
. Type in random_vals
and hit "Run Code".
random_vals
random_vals
Another option is to use the viridis color scales. The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored continuous color schemes that are perceptible to people with various forms of color blindness as well as perceptually uniform in both color and black and white.
Pipe random_vals
to ggplot()
. Within ggplot()
, map x
to random_x
and y
to random_y
. Add the geom_hex()
and coord_fixed()
to the pipeline.
random_vals |> ggplot(aes(x = ..., ... = random_y)) + ... + coord_fixed()
random_vals |> ggplot(aes(x = random_x, y = random_y)) + geom_hex() + coord_fixed()
viridis scales are available as continuous (c), discrete (d), and binned (b) palettes in ggplot2.
Using your previous code, add the scale_fill_viridis_c()
function to the pipeline.
... + scale_fill_viridis_c()
random_vals |> ggplot(aes(x = random_x, y = random_y)) + geom_hex() + coord_fixed() + scale_fill_viridis_c()
Note that all color scales come in two varieties: scale_color_() and scale_fill_() for the color and fill aesthetics respectively (the color scales are available in both UK and US spellings).
Using your previous code, change the scale_fill_viridis_c()
to scale_fill_viridis_b()
.
... + scale_fill_viridis_b()
random_vals |> ggplot(aes(x = random_x, y = random_y)) + geom_hex() + coord_fixed() + scale_fill_viridis_b()
As you can see the color gradient looks chunky and unlike the continuous plot before. This is because it is a binned plot on the viridis color scale.
There are three ways to control the plot limits:
coord_cartesian()
.The last of these is probably what you want to use in most cases because it ensures that any modelling is performed on the entire data set.
Pipe mpg
to ggplot()
, using aes()
, map x
to displ
and y
to hwy
. Add geom_point()
to the pipeline. Within your call to geom_point()
, using aes()
, map color
to drv
. Add geom_smooth()
to the pipeline.
... |> ggplot(aes(... = displ, y = ...)) + geom_point(...(color = drv)) + geom_smooth()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = drv)) + geom_smooth()
This plot shows the relationship between engine size (displ
) and fuel efficiency (hwy
), colored by type of drive train (drv
).
Using the same pipe, add filter()
to the pipe in between the beginning of the pipe and your call to ggplot()
. Within filter()
, include displ
greater than or equal to 5
and displ
less than or equal to 6
.
mpg |> filter(displ >= ... & displ <= ...) |> ...
mpg |> filter(displ >= 5 & displ <= 6) |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = drv)) + geom_smooth()
One way to zoom in on a plot is to decrease the range of the data which is plotted. The "problem" with this approach is that only this data is used in geom's like geom_smooth()
. This is probably not what you want. Subsetting the data has affected the x
and y
scales as well as the smooth curve.
Remove the filter()
line in your pipeline.
There exists a limits
argument on individual scales like scale_x_continuous()
and scale_y_continuous()
. Reducing the limits is equivalent to subsetting the data. To see this, add scale_x_continuous()
with limits
argument to c(5, 6)
and scale_y_continuous()
with limits
argument to c(10, 25)
to the pipeline.
... + scale_x_continuous(limits = ...) + scale_y_continuous(... = c(10, 25))
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = drv)) + geom_smooth() + scale_x_continuous(limits = c(5, 6)) + scale_y_continuous(limits = c(10, 25))
As you can see, this is another way we can control plot limits which is by setting the limits in each scale. Since this is probably not what we want to do, R insists on generating a warning, reminding us about all the data which we are ignoring.
Using your previous pipeline, delete scale_x_continuous()
and scale_y_continuous()
.
Add coord_cartesian()
, setting the xlim
argument to c(5, 6)
and the ylim
argument to c(10, 25)
.
... + coord_cartesian(xlim = c(5, 6), ylim = c(10, 25))
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = drv)) + geom_smooth() + coord_cartesian(xlim = c(5, 6), ylim = c(10, 25))
To zoom in on a region of the plot, it’s generally best to use coord_cartesian()
.
On the other hand, setting the limits on individual scales is generally more useful if you want to expand the limits, e.g., to match scales across different plots.
In this section, we will learn about how to customize the non-data elements of your plot with a theme.
The ggplot2 package includes eight themes, with theme_gray()
as the default. Many more are included in add-on packages such as ggthemes (https://jrnold.github.io/ggthemes), by Jeffrey Arnold. You can also create your own themes, if you are trying to match a particular corporate or journal style.
Pipe mpg
to ggplot()
. Within aes()
, set x = displ
and y = hwy
. Follow with geom_point(aes(color = class))
. Last line is geom_smooth(se = FALSE)
.
... |> ggplot(aes(x = ..., ... = hwy)) + geom_point(...(color = class)) + geom_smooth(se = ...)
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE)
This plot show the theme_gray()
default.
Many people wonder why the default theme has a gray background. This was a deliberate choice because it puts the data forward while still making the grid lines visible. The white grid lines are visible (which is important because they significantly aid position judgments), but they have little visual impact and we can easily tune them out. The gray background gives the plot a similar typographic color to the text, ensuring that the graphics fit in with the flow of a document without jumping out with a bright white background. Finally, the gray background creates a continuous field of color which ensures that the plot is perceived as a single visual entity
Add theme_bw()
to the pipeline.
... + theme_bw()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE) + theme_bw()
This is the classic dark-on-light ggplot2 theme. This theme may work better for presentations displayed with a projector.
Replace theme_bw()
with theme_linedraw()
.
... + theme_linedraw()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE) + theme_linedraw()
A theme with only black lines of various widths on white backgrounds, reminiscent of a line drawing. Serves a purpose similar to theme_bw()
.
Replace theme_linedraw()
with theme_light()
.
... + theme_light()
mpg |> ggplot(aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE) + theme_light()
A theme similar to theme_linedraw()
but with light grey lines and axes, to direct more attention towards the data. There are four other themes built into ggplot2.
Having learned about the ggplot2 themes, we will now create this graphic:
themes_plot <- mpg |> ggplot(aes(x = displ, y = hwy, color = drv)) + geom_point() + labs( title = "Larger engine sizes tend to have lower fuel economy", caption = "Source: https://fueleconomy.gov." ) + theme( legend.position = c(0.6, 0.7), legend.direction = "horizontal", legend.box.background = element_rect(color = "black"), plot.title = element_text(face = "bold"), plot.title.position = "plot", plot.caption.position = "plot", plot.caption = element_text(hjust = 0) ) themes_plot
Start a new pipeline with
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point()
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point()
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point()
It’s possible to control individual components of each theme, like the size and color of the font used for the y
axis. We’ve already seen that legend.position
controls where the legend is drawn. There are many other aspects of the legend that can be customized with theme()
.
Using labs()
, set title
to "Larger engine sizes tend to have lower fuel economy"
and caption
to "Source: https://fueleconomy.gov"
.
... + labs(title = ..., caption = ...)
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() + labs( title = "Larger engine sizes tend to have lower fuel economy", caption = "Source: https://fueleconomy.gov." )
In the plot which we are trying to create, we need to change the direction of the legend as well as put a black border around it. The theme()
function provides us with the ability to make very detailed changes in plot.
Add theme()
to the pipeline. Within theme()
, add the legend.position
argument and set it equal to c(0.6, 0.7)
.
... + theme( legend.position = ... )
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() + labs( title = "Larger engine sizes tend to have lower fuel economy", caption = "Source: https://fueleconomy.gov." ) + theme( legend.position = c(0.6, 0.7) )
The legend.position
argument sets the position of legends ("none", "left", "right", "bottom", "top", or a two-element numeric vector).
Within theme()
, add the legend.direction
argument and set it equal to "horizontal"
.
... + theme( legend.position = c(0.6, 0.7), legend.direction = ... )
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() + labs( title = "Larger engine sizes tend to have lower fuel economy", caption = "Source: https://fueleconomy.gov." ) + theme( legend.position = c(0.6, 0.7), legend.direction = "horizontal" )
The legend.direction
argument sets the layout of items in legends ("horizontal" or "vertical"). Note the change in the legend in this plot relative to the previous one.
Within theme()
, add the legend.box.background
argument and set it equal to element_rect(color = "black")
.
... + theme( legend.position = ..., legend.direction = ..., legend.box.background = ..., )
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() + labs( title = "Larger engine sizes tend to have lower fuel economy", caption = "Source: https://fueleconomy.gov." ) + theme( legend.position = c(0.6, 0.7), legend.direction = "horizontal", legend.box.background = element_rect(color = "black") )
Note that customization of the legend box and plot title elements of the theme are done with element_*()
functions. These functions specify the styling of non-data components, e.g., the legend border color is defined in the color argument of element_rect()
.
Within theme()
, add the plot.title
argument and set it equal to element_text(face = "bold")
.
... + theme( legend.position = ..., legend.direction = ..., legend.box.background = ..., plot.title = ... )
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() + labs( title = "Larger engine sizes tend to have lower fuel economy", caption = "Source: https://fueleconomy.gov." ) + theme( legend.position = c(0.6, 0.7), legend.direction = "horizontal", legend.box.background = element_rect(color = "black"), plot.title = element_text(face = "bold") )
The plot.title
argument changes the plot title (text appearance) (element_text()
; inherits from title) left-aligned by default.
Within theme()
, let's make three more additions: Set plot.title.position
to "plot"
, plot.caption.position
to "plot"
, and plot.caption
to element_text(hjust = 0)
.
... + theme( legend.position = ..., legend.direction = ..., legend.box.background = ..., plot.title = ..., plot.title.position = ..., plot.caption.position = ..., plot.caption = ... )
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() + labs( title = "Larger engine sizes tend to have lower fuel economy", caption = "Source: https://fueleconomy.gov." ) + theme( legend.position = c(0.6, 0.7), legend.direction = "horizontal", legend.box.background = element_rect(color = "black"), plot.title = element_text(face = "bold"), plot.title.position = "plot", plot.caption.position = "plot", plot.caption = element_text(hjust = 0) )
The theme elements that control the position of the title and the caption are plot.title.position
and plot.caption.position
, respectively. In the this plot these are set to "plot"
to indicate these elements are aligned to the entire plot area, instead of the plot panel (the default).
The plot.caption
argument with value element_text(hjust = 0)
places the caption flush left. It would otherwise be right-aligned by default.
Reminder: This is what our plot should look like
themes_plot
For an overview of all theme()
components, see help with ?theme
. The ggplot2 book is also a great place to go for the full details on theming.
So far we talked about how to create and modify a single plot. What if you have multiple plots you want to lay out in a certain way? The patchwork package allows you to combine separate plots into the same graphic.
We have created several plots (p1
through p8
) for you to use in these exercises. Type p1
and hit "Run Code".
p1
p1
The ggplot2 package provides a strong API for sequentially building up a plot, but does not concern itself with composition of multiple plots. patchwork is a package that expands the API to allow for arbitrarily complex composition of plots by, among others, providing mathematical operators for combining multiple plots.
Enter p1
on the first line and p2
on the second line. Hit "Run Code".
...
p2
p1 p2
To place two plots next to each other, you can simply add them to each other, if you have loaded the patchwork package. Note that you first need to create the plots and save them as objects (in the following example they’re called p1
and p2
). Then, you place them next to each other with +
.
Type p1 + p2
and hit "Run Code".
p1 + ...
p1 + p2
It’s important to note that in the above code chunk we did not use a new function from the patchwork package. Instead, the package added a new functionality to the +
operator.
You can also create complex plot layouts with patchwork. Type (p1 | p3) / p2
and hit "Run Code".
(... | p3) / ...
(p1 | p3) / p2
|
places the p1
and p3
next to each other and /
moves p2
to the next line.
Additionally, patchwork allows you to collect legends from multiple plots into one common legend, customize the placement of the legend as well as dimensions of the plots, and add a common title, subtitle, caption, etc. to your plots. Start with (p1 + p2) / (p3 + p4) / p5
.
(p1 + p2) ... (p3 + p4) / ...
(p1 + p2) / (p3 + p4) / p5
This creates a collection of plots with two plots in the first row, two in the second, and then a single plot in the third row.
Modify the code by adding guide_area()
to the front of the call.
... / (p1 + p2) / (p3 + p4) / p5
guide_area() / (p1 + p2) / (p3 + p4) / p5
By default plot guides (like legends) will be put on the side as with regular plots, but by adding a guide_area()
to the plot you can tell patchwork to place the guides in that area instead.
None of our plots, by design, include a legend because we want a single legend for the entire diplay. We also want a single title. To add this, we need to use the patchwork function plot_annotation()
. Add it to our plot, setting the title
argument to "City and highway mileage for cars with different drive trains"
and the caption
argument to "Source: https://fueleconomy.gov."
.
guide_area() / (p1 + p2) / (p3 + p4) / p5 + ...( title = "City and highway mileage for cars with different drive trains", caption = ... )
guide_area() / (p1 + p2) / (p3 + p4) / p5 + plot_annotation( title = "City and highway mileage for cars with different drive trains", caption = "Source: https://fueleconomy.gov." )
patchwork has several functions which perform similar tasks for a collection of plots which ggplot2 provides for individual plots.
plot_layout()
provides fine-grained control over the details of how the plots are put together. Add plot_layout()
with the guides
argument set to "collect"
and the heights
argument set to c(1, 3, 2, 4)
.
... + plot_layout( ... = "collect", heights = ... )
guide_area() / (p1 + p2) / (p3 + p4) / p5 + plot_annotation( title = "City and highway mileage for cars with different drive trains", caption = "Source: https://fueleconomy.gov." ) + plot_layout( guides = "collect", heights = c(1, 3, 2, 4) )
We have customized the heights of the various components of our patchwork – the guide has a height of 1, the box plots 3, density plots 2, and the faceted scatterplot 4. patchwork divides up the area you have allotted for your plot using this scale and places the components accordingly.
The top of the collection is still a bit messed up. We can fix this by adding theme(legend.position = "top")
as the last line. WARNING: Instead of using a +
sign to connect this line to the rest of the code, you need to use an &
.
... & theme(legend.position = "top")
guide_area() / (p1 + p2) / (p3 + p4) / p5 + plot_annotation( title = "City and highway mileage for cars with different drive trains", caption = "Source: https://fueleconomy.gov." ) + plot_layout( guides = "collect", heights = c(1, 3, 2, 4) ) & theme(legend.position = "top")
Note the use of the & operator here instead of the usual +. This is because we’re modifying the theme for the patchwork plot as opposed to the individual ggplots. The legend is placed on top, inside the guide_area().
This tutorial covered Chapter 11: Communication from R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. In this tutorial we made use of three packages associated with ggplot2: scales, ggrepel, and patchwork. Key commands included quote()
which simply returns its argument and geom_label_repel()
which adds text directly to the plot.
ggplot2: Elegant Graphics for Data Analysis (3e) by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen is the best source for all the details about making beautiful graphics with ggplot2.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.