library(learnr)
library(tidyverse)
library(ggmosaic)
library(ggalluvial)
knitr::opts_chunk$set(echo = TRUE)
library(intRo)
data("endangered")
data("europe")
data("alb_capitals")
data("personality")
data("harry_potter")

Faceting

Faceting allows you to split data in a plot into separate panels.

You can facet both vertically or horizontally.

I am still waiting for my Howgwarts letter...

Let's do that with the Harry Potter data. The data frame looked like this.

harry_potter

We have seen a plot of the data before.

Reproduce it with the following code.

harry_potter %>%
  ggplot(aes(element, fill = house)) +
  geom_bar(position = "dodge") +
  # scale_fill_manual() lets you manually specify colours.
  scale_fill_manual(
    values = c("#76040a", "#f29e02", "#0121a2", "#1d492c")
  )

Now let's plot the number of people in each astrological sign and house.

harry_potter %>%
  ggplot(aes(sign, fill = house)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(
    values = c("#76040a", "#f29e02", "#0121a2", "#1d492c")
  )

It's a bit too dense isn't it?

We can improve on that by faceting house so that each house has its own panel.

Facets

You can facet with facet_grid(). This function needs a formula of the form rows ~ columns, where rows and columns are names of columns from the data frame. If you want to facet just by row or just by column, you can replace the other side of the formula with a full stop .: rows ~ ., . ~ columns.

Let's see an example with the former.

harry_potter %>%
  ggplot(aes(sign, fill = house)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(
    values = c("#76040a", "#f29e02", "#0121a2", "#1d492c")
  ) +
  facet_grid(house ~ .)

Much better now! The code facet_grid(house ~ .) asks ggplot2 to facet the data by house and display the panels as individual rows.

Try now to facet by element and to display each element as a separate vertical panel (column) rather than horizontally (rows).

harry_potter %>%
  ggplot(aes(house, fill = house)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(
    values = c("#76040a", "#f29e02", "#0121a2", "#1d492c")
  ) +
  ...

Wonderful!

Wrapping up

When you have many values in a column you want to display in separate panels spanning rows and columns, you can use facet_wrap().

Like facet_grid(), facet_wrap() needs a function, but it only takes functions of the type . ~ colname. In fact, you can omit the full stop . and write ~ colname.

(You can do the same with facet_grid(): facet_grid(~ element) will work too. But facet_grid(house ~) does not!)

Here's how that looks like!

harry_potter %>%
  ggplot(aes(house, fill = house)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(
    values = c("#76040a", "#f29e02", "#0121a2", "#1d492c")
  ) +
  facet_wrap(~ sign)

That's easy, right? There's so much more to learn about ggplot2. But for this workshop we are stopping here.

You can find more in the R for Data Science book, a great resource for self-guided learning. You can read the book here: https://r4ds.had.co.nz.

In the following sections you will go through a "showreel" of other things you can do with R, most of which can be done with tidyverse packages or packages that work well with the tidyverse.

In these sections, I will point you to external resources where you can learn more about these and other advanced skills, and the last section Extra resources has some more pointers!

I hope you enjoyed this data journey and that you will want to use R for your data analysis in the future!

knitr::include_graphics("images/matthew-henry-2Ts5HnA67k8-unsplash.jpg")

Maps, maps, maps!

It's very easy to plot maps in R. If this is your cup of tea, the excellent book Geocomputation with R, by Lovelace, Nowosad and Muenchow, will teach you all you need to know. Check the it out!

Look at how simple it is to create maps. Note: to avoid asking you to install external software required for the package sf, I will just show you code to plot a map and the image output. For info on how to use sf, see https://r-spatial.github.io/sf/index.html.

library(ggrepel)
library(sf)

ggplot() +
  geom_sf(data = europe, fill = "antiquewhite1") +
  geom_point(data = alb_capitals, aes(lng, lat), size = 0.5) +
  geom_text_repel(data = alb_capitals, aes(lng, lat, label = city), size = 3, fontface = "italic") +
  coord_sf(c(19, 21.5), c(39.5, 42.7)) +
  theme(panel.background = element_rect(fill ="aliceblue"))

And this is the map.

knitr::include_graphics("images/alb.png")

Getting artsy with mosaic plots

Mosaic plots are a good way to visualise the number of occurrences in two intersecting categorical variables.

In the following code, we are plotting our endangered data frame, with status on the x-axis and Macroarea on the y-axis.

The main function here is geom_mosaic(). geom_mosaic() is part of the ggmosaic package.

library(ggmosaic)

endangered %>%
  ggplot() +
  geom_mosaic(
    aes(x = product(Macroarea), fill = status), divider = mosaic("v")
  ) +
  scale_fill_brewer(palette = "Reds") +
  theme_mosaic() +
  # This rotates the tick labels of the x-axis
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

The coloured boxes in the plot are proportional to the number of languages in the data frame that belong to the intersecting cells of status and Macroarea. Proportionality is represented both in the horizontal and vertical axes.

Visualise questionnaire data

Likert scales

If you have questionnaire data that uses Likert scales, diverging stacked bar charts are for you.

You have two options:

In either case, you will very likely have to wrangle the data so that it can be plotted.

You can see worked out examples at these websites:

Due to time constraints we won't be able to go through them, but everything you learnt during the workshop will have gotten you up to speed to be able to follow the instructions in the links above.

Alluvial plots

You can create alluvial plots to show the distribution of occurrences across several categorical variables. This type of plots comes handy with questionnaire data which is stratified by, for example, age, gender

In the following example, we are visualising the distribution of survival of the passengers of the Titanic (data from https://www.encyclopedia-titanica.org/explorer/).

titanic_wide <- data.frame(Titanic)

titanic_wide %>%
  ggplot(
    aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)
  ) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Class", "Gender", "Age"), expand = c(.2, .05)) +
  labs(
    x = "Demographic",
    title = "Passengers on the maiden voyage of the Titanic",
    subtitle = "Stratified by demographics and survival"
  )
  theme_minimal()

The geometries geom_alluvium() and geom_stratum() are the core functions of alluvial plots. They are provided by the ggalluvial package. You can read more about alluvial plots with ggalluvial here: https://corybrunson.github.io/ggalluvial/.

Grouping and summarising

Part of data transformation requires you to group and summarise data.

The group_by() and summarise() functions can help you do exactly that.

Here's an example of how they work.

harry_potter %>%
  group_by(house, element) %>%
  summarise(
    number = n()
  )

Read more about them here: https://r4ds.had.co.nz/transform.html#grouped-summaries-with-summarise

Extra resources



intro-rstats/intRo documentation built on Dec. 20, 2021, 7:58 p.m.