library(learnr)
library(intRo)
data("endangered")
data("slavery_tot")
data("slavery")
data("harry_potter")
data("land_use")
knitr::opts_chunk$set(echo = FALSE)

The basics

Start here

The tidyverse package ggplot2 provides users with a consistent set of functions to create captivating graphics. To be able to use the functions in a package, you first need to attach the package. Here, we can attach the entire tidyverse, which includes ggplot2.

You only have to attach packages once, when you start your R session in RStudio. They will stay attached until you close RStudio and end the R session.

Attach the tidyverse now.

library(tidyverse)

A basic plot

These are the minimum constituents of a ggplot.

You can specify the data and mapping with the data and mapping arguments of the ggplot() function.

Note that the mapping argument is always specified with aes().

In the following bare plot, we are just mapping the total time of sleep (sleep_total) to the x-axis, and the time of REM (sleep_rem) to the y-axis.

ggplot(
  data = msleep,
  mapping = aes(
    x = sleep_total,
    y = sleep_rem
  )
)

Let's add geometries

Nice, but we are missing the most important part: showing the data!

Data is represented with geometries, or geoms for short. geoms are added to the base ggplot with functions whose names all start with geom_.

For this plot, you want to use geom_point(). This geom simply adds points to the plot based on the data in the msleep data frame.

Add the geom to the following code and run it.

Note that the data and mapping arguments left explicit in the ggplot() function, since they are obligatory and they are specified in that order (try running ?ggplot in the console to see all the arguments of the function).

ggplot(
  msleep,
  aes(sleep_total, sleep_rem)
) +
  ...

This type of plot, with two continuous axes and data represented by points, is called a scatter plot

Bar charts

Another common type of plot is the bar chart.

Bar charts are useful when you are counting things. In the following example, we will be counting the number of languages by their endangerment status.

Do you understand me?

There are thousands of languages in the world, but most of them are loosing speakers, and some are already no longer spoken. The endangerment status of a language goes from not endangered (languages with large populations of speakers) through threatened, shifting and nearly extinct, to extinct (languages that have no living speakers left).

The endangered data frame contains the endangerment status for 7,845 languages from Glottolog. Here's what it looks like.

endangered

To create a bar chart, add geom_bar() to the plot.

You only need one axis, the x axis to be precise, because the y-axis will always have counts.

endangered %>%
  ggplot(aes(status)) +
  geom_bar()

Note how the counting is done automatically. R looks in the status column and counts how many times each value in the column occurs in the data frame.

If you are baffled by that %>%, keep on.

What the pipe?!

Wait, what is that thing, %>%?

It's called a pipe. Think of a pipe as a teleporter.

In the previous code, instead of specifying the data frame inside ggplot(), we teleport it into ggplot() by using the pipe.

So the following are equivalent.

endangered %>%
  ggplot(aes(status)) +
  geom_bar()

ggplot(
  endangered,
  aes(status)
  ) +
  geom_bar()

Don't worry too much if the concept is not clear. It should become clearer in later tutorials.

Forced journey across the Atlantic

European colonisers forced millions of Africans across the Atlantic, to the Americas. Between 1500 and 1900, more than 12 million people were enslaved and brought to the New World.

The slavery_tot data frame contains the number of enslaved people disembarked in the Americas according to digitised historical records, curated by the Slave Voyages project.

slavery_tot

Let's create a bar chart with number of enslaved people, separated by the flag of the ship they travelled on.

We can use geom_bar() as we did in the previous example, but now there's a catch.

In the endangered data frame, geom_bar() did the counting of languages for each endangerment status. But now, the data frame already holds the counts!

Since slavery_tot already has counts in the enslaved column, we can specify both the x-axis (flag) and the y-axis (enslaved). Then we have to tell geom_bar() that we have already done the counting: we can do this by specifying stat = "identity" (the default is stat = "count").

ggplot(
  slavery_tot,
  aes(flag, enslaved)
) +
  ...
geom_bar(stat = "identity")

Fill and colour

So far, we have used the aesthetics x for the x-axis and y for the y-axis.

Among the many other aesthetics you can specify, there are fill and colour.

Fill

The fill aesthetic let's you fill bars or areas with different colours depending on the values of a specified column.

Let's recreate the plot on colonial slavery, this time adding some colour. Complete the following code by specifying that fill should be based on country.

slavery_dis %>%
  ggplot(
    aes(flag, count, ...)
  ) +
  geom_bar(stat = "identity")

We can improve this plot by reordering the flags based on descending number of disembarked people, with reorder(flag, desc(count)).

slavery_dis %>%
  ggplot(
    aes(
      reorder(flag, desc(count)),
      count,
      fill = country
    )
  ) +
  geom_bar(stat = "identity") +
  labs(x = "Flag", y = "Number of people (millions)")

What if we want to move the colour legend to the bottom of the plot?

Check out the documentation of theme by typing ?theme in the RStudio console and press enter. Search for the word position.

question(
  "Which of the following moves the legend to the bottom of the plot?",
  answer('`legend("bottom")`'),
  answer('`theme(legend.position = "bottom")`', correct = TRUE),
  answer('`theme(legend.bottom)`')
)

Move the legend

Now move the legend to the "bottom" of the chart.

slavery_dis %>%
  ggplot(
    aes(
      reorder(flag, desc(count)),
      count,
      fill = country
    )
  ) +
  geom_bar(stat = "identity") +
  labs(x = "Flag", y = "Number of people (millions)") +
  ...
theme(legend.position = "...")

Colour

colour works like fill, but it is used with borders or lines of certain geometries like geom_point() and geom_line().

In the following plot, we wan to visualise the change in use of land across the globe through time. There are three uses in data frame: built_up for construction land, cropland for land use for crops, and grazing for land used in animal foraging. Land area is in billion hectares.

Add the aesthetics to the following code and check the resulting plot. The plot needs year and area as the axes and use for the colour aesthetics.

land_use %>%
  ggplot(aes(..., ..., ...)) +
  geom_line(size = 2)

Line type

If we wish to use a different type of line depending on the values of a column, we can use the linetype aesthetic.

Change the plot on land use so that it uses both colour and linetype to differentiate the type of land use.

Remember to add year and area as the axes.

land_use %>%
  ggplot(aes(..., ..., ...)) +
  geom_line(size = 2)
land_use %>%
  ggplot(aes(year, area, ...)) +
  geom_line(size = 2)

The end!

Congratulations! You completed this tutorial.



intro-rstats/intRo documentation built on Dec. 20, 2021, 7:58 p.m.