library(learnr) library(intRo) data("endangered") data("slavery_tot") data("slavery") data("harry_potter") data("land_use") knitr::opts_chunk$set(echo = FALSE)
The tidyverse package ggplot2 provides users with a consistent set of functions to create captivating graphics. To be able to use the functions in a package, you first need to attach the package. Here, we can attach the entire tidyverse, which includes ggplot2.
You only have to attach packages once, when you start your R session in RStudio. They will stay attached until you close RStudio and end the R session.
Attach the tidyverse now.
library(tidyverse)
These are the minimum constituents of a ggplot.
The data
: you have to specify the data frame with the data you want to plot.
The mapping
: the mapping tells ggplot how to map data to parts of the plot like the axes or groupings within the data. These parts are called aesthetics, or aes
for short.
You can specify the data and mapping with the data
and mapping
arguments of the ggplot()
function.
Note that the mapping
argument is always specified with aes()
.
In the following bare plot, we are just mapping the total time of sleep (sleep_total
) to the x-axis, and the time of REM (sleep_rem
) to the y-axis.
ggplot( data = msleep, mapping = aes( x = sleep_total, y = sleep_rem ) )
Nice, but we are missing the most important part: showing the data!
Data is represented with geometries, or geom
s for short.
geom
s are added to the base ggplot with functions whose names all start with geom_
.
For this plot, you want to use geom_point()
.
This geom simply adds points to the plot based on the data in the msleep
data frame.
Add the geom to the following code and run it.
Note that the data
and mapping
arguments left explicit in the ggplot()
function, since they are obligatory and they are specified in that order (try running ?ggplot
in the console to see all the arguments of the function).
ggplot( msleep, aes(sleep_total, sleep_rem) ) + ...
This type of plot, with two continuous axes and data represented by points, is called a scatter plot
Another common type of plot is the bar chart.
Bar charts are useful when you are counting things. In the following example, we will be counting the number of languages by their endangerment status.
There are thousands of languages in the world, but most of them are loosing speakers, and some are already no longer spoken.
The endangerment status of a language goes from not endangered
(languages with large populations of speakers) through threatened
, shifting
and nearly extinct
, to extinct
(languages that have no living speakers left).
The endangered
data frame contains the endangerment status for 7,845 languages from Glottolog.
Here's what it looks like.
endangered
To create a bar chart, add geom_bar()
to the plot.
You only need one axis, the x axis to be precise, because the y-axis will always have counts.
endangered %>% ggplot(aes(status)) + geom_bar()
Note how the counting is done automatically.
R looks in the status
column and counts how many times each value in the column occurs in the data frame.
If you are baffled by that %>%
, keep on.
Wait, what is that thing, %>%
?
It's called a pipe. Think of a pipe as a teleporter.
In the previous code, instead of specifying the data frame inside ggplot()
, we teleport it into ggplot()
by using the pipe.
So the following are equivalent.
endangered %>% ggplot(aes(status)) + geom_bar() ggplot( endangered, aes(status) ) + geom_bar()
Don't worry too much if the concept is not clear. It should become clearer in later tutorials.
European colonisers forced millions of Africans across the Atlantic, to the Americas. Between 1500 and 1900, more than 12 million people were enslaved and brought to the New World.
The slavery_tot
data frame contains the number of enslaved people disembarked in the Americas according to digitised historical records, curated by the Slave Voyages project.
slavery_tot
Let's create a bar chart with number of enslaved people, separated by the flag of the ship they travelled on.
We can use geom_bar()
as we did in the previous example, but now there's a catch.
In the endangered
data frame, geom_bar()
did the counting of languages for each endangerment status.
But now, the data frame already holds the counts!
Since slavery_tot
already has counts in the enslaved
column, we can specify both the x-axis (flag
) and the y-axis (enslaved
).
Then we have to tell geom_bar()
that we have already done the counting: we can do this by specifying stat = "identity"
(the default is stat = "count"
).
ggplot( slavery_tot, aes(flag, enslaved) ) + ...
geom_bar(stat = "identity")
So far, we have used the aes
thetics x
for the x-axis and y
for the y-axis.
Among the many other aes
thetics you can specify, there are fill
and colour
.
The fill
aesthetic let's you fill bars or areas with different colours depending on the values of a specified column.
Let's recreate the plot on colonial slavery, this time adding some colour. Complete the following code by specifying that fill
should be based on country
.
slavery_dis %>% ggplot( aes(flag, count, ...) ) + geom_bar(stat = "identity")
We can improve this plot by reordering the flags based on descending number of disembarked people, with reorder(flag, desc(count))
.
slavery_dis %>% ggplot( aes( reorder(flag, desc(count)), count, fill = country ) ) + geom_bar(stat = "identity") + labs(x = "Flag", y = "Number of people (millions)")
What if we want to move the colour legend to the bottom of the plot?
Check out the documentation of theme
by typing ?theme
in the RStudio console and press enter.
Search for the word position
.
question( "Which of the following moves the legend to the bottom of the plot?", answer('`legend("bottom")`'), answer('`theme(legend.position = "bottom")`', correct = TRUE), answer('`theme(legend.bottom)`') )
Now move the legend to the "bottom"
of the chart.
slavery_dis %>% ggplot( aes( reorder(flag, desc(count)), count, fill = country ) ) + geom_bar(stat = "identity") + labs(x = "Flag", y = "Number of people (millions)") + ...
theme(legend.position = "...")
colour
works like fill, but it is used with borders or lines of certain geometries like geom_point()
and geom_line()
.
In the following plot, we wan to visualise the change in use of land across the globe through time.
There are three uses in data frame: built_up
for construction land, cropland
for land use for crops, and grazing
for land used in animal foraging.
Land area
is in billion hectares.
Add the aesthetics to the following code and check the resulting plot.
The plot needs year
and area
as the axes and use
for the colour
aesthetics.
land_use %>% ggplot(aes(..., ..., ...)) + geom_line(size = 2)
If we wish to use a different type of line depending on the values of a column, we can use the linetype
aesthetic.
Change the plot on land use so that it uses both colour
and linetype
to differentiate the type of land use.
Remember to add year
and area
as the axes.
land_use %>% ggplot(aes(..., ..., ...)) + geom_line(size = 2)
land_use %>% ggplot(aes(year, area, ...)) + geom_line(size = 2)
Congratulations! You completed this tutorial.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.