library(learnr) library(tidyverse) library(ggmosaic) library(ggalluvial) knitr::opts_chunk$set(echo = TRUE) library(intRo) data("endangered") data("europe") data("alb_capitals") data("personality") data("harry_potter")
Faceting allows you to split data in a plot into separate panels.
You can facet both vertically or horizontally.
Let's do that with the Harry Potter data. The data frame looked like this.
harry_potter
We have seen a plot of the data before.
Reproduce it with the following code.
harry_potter %>% ggplot(aes(element, fill = house)) + geom_bar(position = "dodge") + # scale_fill_manual() lets you manually specify colours. scale_fill_manual( values = c("#76040a", "#f29e02", "#0121a2", "#1d492c") )
Now let's plot the number of people in each astrological sign and house.
harry_potter %>% ggplot(aes(sign, fill = house)) + geom_bar(position = "dodge") + scale_fill_manual( values = c("#76040a", "#f29e02", "#0121a2", "#1d492c") )
It's a bit too dense isn't it?
We can improve on that by faceting house
so that each house has its own panel.
You can facet with facet_grid()
.
This function needs a formula of the form rows ~ columns
, where rows
and columns
are names of columns from the data frame.
If you want to facet just by row or just by column, you can replace the other side of the formula with a full stop .
: rows ~ .
, . ~ columns
.
Let's see an example with the former.
harry_potter %>% ggplot(aes(sign, fill = house)) + geom_bar(position = "dodge") + scale_fill_manual( values = c("#76040a", "#f29e02", "#0121a2", "#1d492c") ) + facet_grid(house ~ .)
Much better now!
The code facet_grid(house ~ .)
asks ggplot2 to facet the data by house
and display the panels as individual rows.
Try now to facet by element
and to display each element as a separate vertical panel (column) rather than horizontally (rows).
harry_potter %>% ggplot(aes(house, fill = house)) + geom_bar(position = "dodge") + scale_fill_manual( values = c("#76040a", "#f29e02", "#0121a2", "#1d492c") ) + ...
Wonderful!
When you have many values in a column you want to display in separate panels spanning rows and columns, you can use facet_wrap()
.
Like facet_grid()
, facet_wrap()
needs a function, but it only takes functions of the type . ~ colname
.
In fact, you can omit the full stop .
and write ~ colname
.
(You can do the same with facet_grid()
: facet_grid(~ element)
will work too. But facet_grid(house ~)
does not!)
Here's how that looks like!
harry_potter %>% ggplot(aes(house, fill = house)) + geom_bar(position = "dodge") + scale_fill_manual( values = c("#76040a", "#f29e02", "#0121a2", "#1d492c") ) + facet_wrap(~ sign)
That's easy, right? There's so much more to learn about ggplot2. But for this workshop we are stopping here.
You can find more in the R for Data Science book, a great resource for self-guided learning. You can read the book here: https://r4ds.had.co.nz.
In the following sections you will go through a "showreel" of other things you can do with R, most of which can be done with tidyverse packages or packages that work well with the tidyverse.
In these sections, I will point you to external resources where you can learn more about these and other advanced skills, and the last section Extra resources
has some more pointers!
I hope you enjoyed this data journey and that you will want to use R for your data analysis in the future!
knitr::include_graphics("images/matthew-henry-2Ts5HnA67k8-unsplash.jpg")
It's very easy to plot maps in R. If this is your cup of tea, the excellent book Geocomputation with R, by Lovelace, Nowosad and Muenchow, will teach you all you need to know. Check the it out!
Look at how simple it is to create maps. Note: to avoid asking you to install external software required for the package sf, I will just show you code to plot a map and the image output. For info on how to use sf, see https://r-spatial.github.io/sf/index.html.
library(ggrepel) library(sf) ggplot() + geom_sf(data = europe, fill = "antiquewhite1") + geom_point(data = alb_capitals, aes(lng, lat), size = 0.5) + geom_text_repel(data = alb_capitals, aes(lng, lat, label = city), size = 3, fontface = "italic") + coord_sf(c(19, 21.5), c(39.5, 42.7)) + theme(panel.background = element_rect(fill ="aliceblue"))
And this is the map.
knitr::include_graphics("images/alb.png")
Mosaic plots are a good way to visualise the number of occurrences in two intersecting categorical variables.
In the following code, we are plotting our endangered
data frame, with status
on the x-axis and Macroarea
on the y-axis.
The main function here is geom_mosaic()
.
geom_mosaic()
is part of the ggmosaic package.
library(ggmosaic) endangered %>% ggplot() + geom_mosaic( aes(x = product(Macroarea), fill = status), divider = mosaic("v") ) + scale_fill_brewer(palette = "Reds") + theme_mosaic() + # This rotates the tick labels of the x-axis theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
The coloured boxes in the plot are proportional to the number of languages in the data frame that belong to the intersecting cells of status
and Macroarea
.
Proportionality is represented both in the horizontal and vertical axes.
If you have questionnaire data that uses Likert scales, diverging stacked bar charts are for you.
You have two options:
likert()
from the HH package.In either case, you will very likely have to wrangle the data so that it can be plotted.
You can see worked out examples at these websites:
likert()
Due to time constraints we won't be able to go through them, but everything you learnt during the workshop will have gotten you up to speed to be able to follow the instructions in the links above.
You can create alluvial plots to show the distribution of occurrences across several categorical variables. This type of plots comes handy with questionnaire data which is stratified by, for example, age, gender
In the following example, we are visualising the distribution of survival of the passengers of the Titanic (data from https://www.encyclopedia-titanica.org/explorer/).
titanic_wide <- data.frame(Titanic) titanic_wide %>% ggplot( aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq) ) + geom_alluvium(aes(fill = Survived)) + geom_stratum() + geom_text(stat = "stratum", aes(label = after_stat(stratum))) + scale_x_discrete(limits = c("Class", "Gender", "Age"), expand = c(.2, .05)) + labs( x = "Demographic", title = "Passengers on the maiden voyage of the Titanic", subtitle = "Stratified by demographics and survival" ) theme_minimal()
The geometries geom_alluvium()
and geom_stratum()
are the core functions of alluvial plots.
They are provided by the ggalluvial package.
You can read more about alluvial plots with ggalluvial here: https://corybrunson.github.io/ggalluvial/.
Part of data transformation requires you to group and summarise data.
The group_by()
and summarise()
functions can help you do exactly that.
Here's an example of how they work.
harry_potter %>% group_by(house, element) %>% summarise( number = n() )
Read more about them here: https://r4ds.had.co.nz/transform.html#grouped-summaries-with-summarise
The tidyverse website.
Create your own R tutorial with learnr.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.