In appliedepi/introexercises:

# load packages ----------------------------------------------------------------
library(introexercises)
library(learnr)
library(gradethis)
library(flair)
library(dplyr)
library(ggplot2)
library(gghighlight)
library(janitor)
library(stringr)
library(lubridate)
library(epikit)
library(fontawesome)
library(scales)
# library(RMariaDB)        # connect to sql database 

## set options for exercises and checking ---------------------------------------

## Define how exercises are evaluated 
gradethis::gradethis_setup(
  ## note: the below arguments are passed to learnr::tutorial_options
  ## set the maximum execution time limit in seconds
  exercise.timelimit = 60, 
  ## set how exercises should be checked (defaults to NULL - individually defined)
  # exercise.checker = gradethis::grade_learnr
  ## set whether to pre-evaluate exercises (so users see answers)
  exercise.eval = FALSE 
)

# ## event recorder ---------------------------------------------------------------
# ## see for details: 
# ## https://pkgs.rstudio.com/learnr/articles/publishing.html#events
# ## https://github.com/dtkaplan/submitr/blob/master/R/make_a_recorder.R
# 
# ## connect to your sql database
# sqldtbase <- dbConnect(RMariaDB::MariaDB(),
#                        user     = Sys.getenv("userid"),
#                        password = Sys.getenv("pwd"),
#                        dbname   = 'excersize_log',
#                        host     = "144.126.246.140")
# 
# 
# ## define a function to collect data 
# ## note that tutorial_id is defined in YAML
#     ## you could set the tutorial_version too (by specifying version:) but use package version instead 
# recorder_function <- function(tutorial_id, tutorial_version, user_id, event, data) {
#     
#   ## define a sql query 
#   ## first bracket defines variable names
#   ## values bracket defines what goes in each variable
#   event_log <- paste("INSERT INTO responses (
#                        tutorial_id, 
#                        tutorial_version, 
#                        date_time, 
#                        user_id, 
#                        event, 
#                        section,
#                        label, 
#                        question, 
#                        answer, 
#                        code, 
#                        correct)
#                        VALUES('", tutorial_id,  "', 
#                        '", tutorial_version, "', 
#                        '", format(Sys.time(), "%Y-%M%-%D %H:%M:%S %Z"), "',
#                        '", Sys.getenv("SHINYPROXY_PROXY_ID"), "',
#                        '", event, "',
#                        '", data$section, "',
#                        '", data$label,  "',
#                        '", paste0('"', data$question, '"'),  "',
#                        '", paste0('"', data$answer,   '"'),  "',
#                        '", paste0('"', data$code,     '"'),  "',
#                        '", data$correct, "')",
#                        sep = '')
# 
#     # Execute the query on the sqldtbase that we connected to above
#     rsInsert <- dbSendQuery(sqldtbase, event_log)
#   
# }
# 
# options(tutorial.event_recorder = recorder_function)

# hide non-exercise code chunks ------------------------------------------------
knitr::opts_chunk$set(echo = FALSE)

# data prep --------------------------------------------------------------------
surv <- rio::import(system.file("dat/old_version/surveillance_linelist_clean_20141201.rds", package = "introexercises"))

Introduction to R for Applied Epidemiology and Public Health

Welcome

Welcome to the live course "Introduction to R for applied epidemiologists", offered by Applied Epi - a nonprofit organisation that offers open-source tools, training, and support to frontline public health practitioners.

knitr::include_graphics("images/logo.png", error = F)

Data visualization

This exercise focuses on scales and themes in {ggplot2}.

Format

This exercise will guide you through a set of tasks.
You should perform these tasks in RStudio and on your local computer.

Getting Help

There are several ways to get help:

1) Look for the "helpers" (see below) 2) Ask your live course instructor/facilitator for help
3) Ask a colleague or other participant in the course for tips
4) Post a question in Applied Epi Community in the category for questions about Applied Epi Training

Here is what those "helpers" will look like:

r fontawesome::fa("lightbulb", fill = "gold") Click to read a hint

Here you will see a helpful hint!

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

linelist %>% 
  filter(
    age > 25,
    district == "Bolo"
  )

Here is more explanation about why the solution works.

Quiz questions

Please complete the quiz questions that you encounter throughout the tutorial. Answering will help you to comprehend the material, and will also help us to improve the exercises for future students.

To practice, please answer the following questions:

quiz(
  question_radio("When should I view the red 'helper' code?",
    answer("After trying to write the code myself", correct = TRUE),
    answer("Before I try coding", correct = FALSE),
    correct = "Reviewing best-practice code after trying to write yourself can help you improve",
    incorrect = "Please attempt the exercise yourself, or use the hint, before viewing the answer."
  )
)

question_numeric(
 "How anxious are you about beginning this tutorial - on a scale from 1 (least anxious) to 10 (most anxious)?",
 answer(10, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(9, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(8, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(7, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(6, message = "Ok, we will get there together", correct = T),
 answer(5, message = "Ok, we will get there together", correct = T),
 answer(4, message = "I like your confidence!", correct = T),
 answer(3, message = "I like your confidence!", correct = T),
 answer(2, message = "I like your confidence!", correct = T),
 answer(1, message = "I like your confidence!", correct = T),
 allow_retry = TRUE,
 correct = "Thanks for sharing. ",
 min = 1,
 max = 10,
 step = 1
)

Icons

You will see these icons throughout the exercises:

Icon |Meaning ------|-------------------- r fontawesome::fa("eye", fill = "darkblue")|Observe
r fontawesome::fa("exclamation", fill = "red")|Alert!
r fontawesome::fa("pen", fill = "brown")|An informative note
r fontawesome::fa("terminal", fill = "black")|Time for you to code!
r fontawesome::fa("window-restore", fill = "darkgrey")|Change to another window
r fontawesome::fa("bookmark", fill = "orange")|Remember this for later

License

Applied Epi Incorporated, 2022
This work is licensed by Applied Epi Incorporated under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Please email contact@appliedepi.org with questions about the use of these materials for academic courses and epidemiologist training programs.

Learning objectives

In this exercise you will:

Practice adjusting the scales commands within {ggplot2}
Make adjustments to the themes of ggplots
Save ggplots as PNG files

Prepare

Prepare your script

Open the R project, as usual, and your script "ebola_analysis.R".

Load packages

Add the following R packages to your {pacman} command at the top of your script:

{RColorBrewer}
{viridis}
{scales}

Run the code to install your R packages.

Run previous code

If you left RStudio since the previous exercise, clear your environment and then re-run all code in your script in order to again use the cleaned surv dataset. If you encounter errors, you always have the option of importing the clean dataset from "data/clean" subfolder.

Color scales

Scale commands replace defaults of how the aesthetic mappings manifest, such as:

Which colors or shapes to display
The min/max of point sizes
The min/max and frequency of axes breaks

As a generic formula, these commands are written as: scale_AESTHETIC_METHOD().

scale_ : this prefix never changes
AESTHETIC: _fill_ , _color_ , _x_ , _y_ , etc.
METHOD: _continuous(), _discrete(), _manual(), _date(), etc.

Some examples of scale commands:

You want to adjust |Scale command
----------------------|-------------------
continuous y-axis |scale_y_continuous()
date x-axis |scale_x_date()
categorical x-axis |scale_x_discrete()
fill, continuous |scale_fill_continuous()
fill, continuous |scale_fill_gradient()
color, manual assignment|scale_color_manual()

Here we show two different ways to create a continuous color gradient.

The scale_*_continuous functions work with pre-built gradient palettes
scale_*_gradient() creates a 2 color gradient
scale_*_gradient2 allows you to also set a midpoint color between these two
scale_gradient_n() allows you to create more complex palettes.

More information on these functions is available here.

Default color scales

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar()

Above, the fill of a bar plot uses the default colors and axis breaks. We can adjust the elements of this plot with a scale_AESTHETIC_METHOD() function added (+) to the end of our ggplot()

Adjust fill

Here we adjust the fill color of the bars manually (scale_fill_manual()). We provide assignments to the values in our dataset ("male" and "female") within a vector (c()). To assign a color to the NA values we need to specify this with the separate argument na.value =

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_manual(        
  values = c(
   "male" = "violetred", 
   "female" = "aquamarine"),
   na.value = "green")

Here we have chosen some ugly colors to highlight what we are changing! Try changing the color for "male" to "dodgerblue" and "female" to "tomato" for a nicer color combination in the code below. Also set NA to be "grey", a common standard plot color.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_manual(        
  values = c(
    "male" = "dodgerblue",
    "female" = "tomato"),
  na.value = "grey")

Note that the character values you put in the vector c() need to match the vales in the data exactly (e.g. "Male" is NOT the same as "male").

Built-in color scales

{ggplot2} and the package {RColorBrewer} offers a number of pre-configured palettes for color scales that are continuous, discrete, diverging, etc.

As we are working here with discrete data we can use the function scale_fill_brewer() to access the following palettes rather than specifying our own colors:

RColorBrewer::display.brewer.all()

As we are working with a discrete scale, the middle group of color palettes are most appropriate. Below we select the palette "Pastel2", and specify that missing values should be "grey".

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_brewer(palette = "Pastel2",
                  na.value = "grey")

A color-blind friendly palette is available as well. This comes in discrete and continuous forms scale_fill_viridis_d() and scale_fill_viridis_c():

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey")

Adjust the ggplot command above to use age_cat instead of gender. How do the brewer and viridis_d color scales look with more categories?

quiz(caption = "Quiz - color brewer",
  question("What color is the age group 10-19 using scale_fill_brewer(palette = 'Pastel2')?",
    allow_retry = T,
    answer("green"),
    answer("yellow"),
    answer("orange", correct = T),
    answer("brown")
  ),
  question("What is the importance of the '_fill_' in the commands?",
    allow_retry = T,
    answer("It instructs the plot to be filled with joy"),
    answer("Because this is a bar plot, fill is the aesthetic that produces the colors in the bars", correct = T),
    answer("It sets the background color (grey)"),
    answer("It adjusts the spaces between the bars")
  )
)

Continuous color scales

Try applying what you have learned to add a continuous viridis palette to the following plot. Be aware that here we are dealing with a color rather than fill aesthetic because we use geom_point(). It is best practice to also specify an na.value =

ggplot(
  data = surv,
  mapping = aes(
    x = age_years,
    y = wt_kg,
    color = temp)) +
geom_point()

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = age_years,
    y = wt_kg,
    color = temp)) +
geom_point() +
scale_color_viridis_c(na.value = "grey")

Here are some further resources for you:

Viridis (try with option = "plasma" or "inferno"), and colorbrewer palette functions can be added to any ggplot.

Axes scales

We can edit axes in a similar way, with similar commands.

Adjusting Y-axis

In a barplot such as the one below, we have a continuous Y-axis and discrete X-axis. Here we might decide that the counts on the Y-axis are not descriptive enough so we wish to supply our own break points.

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey")

In scale_y_continuous() we adjust the Y-axis breaks using seq() to define a numeric sequence.

Try running the command seq(from = 0, to = 250, by = 25) in the R Console, just to see the result. Try it again with different argument values.

Now, add the function scale_y_continuous(). Inside this function, set the argument breaks = to seq() as written above.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(breaks = seq(from = 0,
                                to = 250,
                                by = 25))

Starting scales at 0

You may have noticed that {ggplot2} has a behavior of expanding your axis beyond the data, with a gap between the values and the axis at the bottom. This can be fixed with the axes scales using the expand = argument.

Using the previous ggplot command, add a second argument to scale_y_continuous() that is expand = c(0,0). This is telling ggplot() to start the Y-axis at the plot coordinates (0,0) with no buffer space.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(breaks = seq(from = 0,
                                to = 250,
                                by = 25),
                   expand = c(0,0))

Try applying the same expand = c(0,0) syntax to the discrete x-axis, by adding scale_x_discrete():

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(breaks = seq(from = 0,
                                to = 250,
                                by = 25),
                   expand = c(0,0)) +
scale_x_discrete(expand = c(0,0))

Flip axes

Finally, we can flip the X and Y axes by adding coord_flip() (with empty parentheses). This is useful in bar charts so that more discrete value names can be displayed without overlapping each other on the x-axis.

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = gender)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(breaks = seq(from = 0,
                                to = 250,
                                by = 25),
                   expand = c(0,0)) +
scale_x_discrete(expand = c(0,0))+
coord_flip()

Note that if adjusting the labels with labs() you still edit x = to adjust the label that now appears on the y-axis.

Date axis labels

Date axes also have scales that can be adjusted with scale_*() functions.

The default scale for date axis labels will vary by the range of your data. Here is an example plot:

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram()

Adjust axis labels with scale_x_date().

Manual date breaks

Within scale_x_date(), you can use the argument date_breaks= to provide values like "1 week", "2 weeks", or "3 months".

Note: these are the axis label breaks, the don't affect the bins of the histogram (bar widths), We will discuss best practices for setting the binwidths of histograms for epidemic curves in a subsequent module.

Try editing this code so that the date labels appear every 2 months:

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram()

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram() +
scale_x_date(date_breaks = "2 months")

quiz(caption = "Quiz - date breaks",
  question("What is the default date format displayed when setting 'date_breaks =' ?",
    allow_retry = T,
    answer("MM/DD/YYYY"),
    answer("DD/MM/YYYY"),
    answer("YYYY-MM-DD", correct = T),
    answer("YYYY-DD-MM")
  )
)

Date axis labels

You (or your supervisor) may not like the date labels appearing as YYYY-MM-DD.

You can specify the date labels format with the additional argument date_labels =.

This argument accepts a character value (within quotes), constructed using "strptime" syntax - see R documentation for more information on this

For example: the value "%d %b %Y" will change the display to DD MMM YYYY (note spaces instead of dashes). You can also trigger a new line with \n, e.g. to move the year below the day and month.

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram() +
scale_x_date(date_breaks = "2 months",
             date_labels = "%d %b \n %Y" )

Here is the complete list of strptime abbreviations:

%d = Day number of month (5, 17, 28, etc.)
%j = Day number of the year (Julian day 001-366)
%a = Abbreviated weekday (Mon, Tue, Wed, etc.)
%A = Full weekday (Monday, Tuesday, etc.)
%w = Weekday number (0-6, Sunday is 0)
%u = Weekday number (1-7, Monday is 1)
%W = Week number (00-53, Monday is week start)
%U = Week number (01-53, Sunday is week start)
%m = Month number (e.g. 01, 02, 03, 04)
%b = Abbreviated month (Jan, Feb, etc.)
%B = Full month (January, February, etc.)
%y = 2-digit year (e.g. 89)
%Y = 4-digit year (e.g. 1989)
%h = hours (24-hr clock)
%m = minutes
%s = seconds
%z = offset from GMT
%Z = Time zone (character)

See Epi R Handbook Epicurves and Strings pages for more tips

Auto-efficient date axes

There is also a built-in simplification for date labels using the {scales} package.

Confusingly this is applied using the labels = rather than date_labels = argument. Assigning labels = to label_date_short(). This produces date axis labels that automatically show the least amount of informaton necessary to convey changes in month, year, etc. It is very nice!

You can read the {scales} documentation here or in your Help pane. It has many useful functions.

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram() +
scale_x_date(date_breaks = "2 months",
             labels = label_date_short() )

Using the code above, adjust the date_breaks = value to "2 weeks". What happens and how does label_date_short() adjust to account for this?

Display percents

The {scales} package that you just learned about has another useful function, percent(), that can adjust axes to fluidly display percents, even though they are decimals in the data.

If you were to modify the values in your data to display the character symbol "%", then your values would become characters themselves! As you know, "36" is different from 36. And on an axis, it will not sort intuitively if it is a character.

Thankfully, we have the percent() function.

We can easily display proportions as percents with percent() from scales within scale_y_continuous().

To test this, let's create a dataset using group_by() and summarise() that creates a proportion - the weekly proportion of cases that have more than 7 days delay between symptom onset and their report date.

delay_1wk <- surv %>%                                      # begin with surveillance linelist
  mutate(diff_1wk = as.numeric(diff) > 7) %>%              # create column that is TRUE is diff is greater than 7
  group_by(week = floor_date(date_report, "week")) %>%     # create column "week" and group by it  
  summarise(                                               # begin summarise command     
    cases = n(),                                           # number of cases in the week
    delayed = sum(diff_1wk == TRUE, na.rm=T),              # number of delayed cases in the week 
    delayed_pct = delayed / cases)                         # calculate proportion

This new summary dataset looks like this:

delay_1wk %>% 
  knitr::kable()

Write a ggplot command using geom_line() that has the following settings:

Uses the delay_1wk dataset
week on the X-axis
delayed_pct on the Y-axis
size = 2 and color = "brown"within geom_line()

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(data = delay_1wk, mapping = aes(x = week, y = delayed_pct))+
  geom_line(size = 2, color = "brown")

quiz(caption = "Quiz - Y-axis percent scale",
  question("Which scale_*() command should you use to adjust this y-axis?",
    allow_retry = T,
    answer("scale_x_discrete()"),
    answer("scale_fill_continuous()"),
    answer("scale_y_continuous()", correct = T),
    answer("scale_color_discrete()")
  )
)

Now, apply the appropriate scale_*() function, and include the argument labels = percent. Note that when setting this argument equal to a function, you do not need to include the parentheses at the end of percent().

ggplot(data = delay_1wk, mapping = aes(x = week, y = delayed_pct))+
  geom_line(size = 2, color = "brown")+
  scale_y_continuous(labels = percent)

Plot labels

Static labels

Let us continue using the data frame and plot from the previous section:

ggplot(data = delay_1wk, mapping = aes(x = week, y = delayed_pct))+
  geom_line(size = 2, color = "brown")+
  scale_y_continuous(labels = percent)

Add a caption that reads: "n = 663. Report produced on 2022-04-02. Data collected from 5 major hospitals in the epidemic-affected area. Last reported case on 2014-12-21. 7 cases missing date of onset."

But! - Make each sentence start on a new line.

r fontawesome::fa("lightbulb", fill = "gold") Click to read a hint

Add a labs() function, using caption =. Within the quotes, place "\n" in each place where you want a newline to appear.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(data = delay_1wk, mapping = aes(x = week, y = delayed_pct))+
  geom_line(size = 2, color = "brown")+
  labs(caption = "n = 663.\nReport produced on 2022-04-02.Data collected from 5 major hospitals in the epidemic-affected area.\nLast reported case on 2014-12-21.\n7 cases missing date of onset.")

Dynamic labels

The caption as written will work... for this particular moment and dataset. But what happens if you get an updated dataset? The caption is static and will remain the same.

The {stringr} package contains the package str_glue(), which allows us to embed code within character strings, so that the strings will update with new data.

To use this function, first, wrap the character string within str_glue(), like this: str_glue("n = 663"). This will print that n is the static number 663.

To make it dynamically reflect the number of rows in the data frame surv, you can insert curly brackets within the quotation marks. Within the brackets, insert your R code, for example, the function nrow()

str_glue("n = {nrow(surv)}") (Try running this in your R console)

Within the quotation marks, you can continue your writing, and even include other sections of code:

str_glue("n = {nrow(surv)} confirmed cases. There are {ncol(surv)} columns in the data frame") (Try running this in your R console)

Now you see the power of this... there are some other functions to help you craft an excellent caption:

unique() A {base} R function that returns the of unique values, such as unique(surv$district)
- Combine this with length() to return the number of unique values: length(unique(surv$district))
Sys.Date() Returns the current time as per your computer. Do not put anything in the parentheses.
fmt_count() is a function from the package {epikit} that if provided a data frame and logical criteria, will return a nicely formatted statement of the number of observations. For example:
- fmt_count(surv, is.na(hospital)) (try running this in your R console)

Now that you have these tools, revise your 4-sentence caption so that the numbers will all automatically update.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(data = delay_1wk, mapping = aes(x = week, y = delayed_pct))+
  geom_line(size = 2, color = "brown")+
  labs(caption = str_glue("n = {nrow(surv)}.\nReport produced on {Sys.Date()}\nData collected from {length(unique(surv$hospital))-2} major hospitals in the epidemic-affected area.\nLast reported case on {max(surv$date_report, na.rm = TRUE)}.\n{fmt_count(surv, is.na(date_report))} cases missing date of onset and not shown."))

Did you catch that unique() will also count the values NA and "Other"? You may want to subtract 2 from that calculation.

Once your captions get very complex, you can arrange the str_glue() in a different way, so that it is easier to read and manage. The code is separated and placed towards the bottom, with placeholders in the text itself. See this section of the Epi R Handbook.

Theme elements

Themes are non-data design features (background, text size/color, etc).

Complete themes

These "complete themes" are easy to add.

# Try one of these...
+ theme_bw()
+ theme_classic()
+ theme_dark()
+ theme_gray()
+ theme_minimal()
+ theme_light()
+ theme_void()

Try adding them to some of your former plots.

Try the argument base_size = 16 inside the theme function, to quickly increase text sizes.

r fontawesome::fa("bookmark", fill = "orange") Which one do you prefer? Mentally bookmark to use later!

Micro-adjustments to themes

Micro-adjustments to the theme can be made with theme(). As these are mostly small layout and visual details we won't go into much detail here. More information is available in the Epi R Handbook.

The syntax for themes takes time to learn and is not used often enough to commit to memory for most R users. See this list of feature-specific arguments. Or run, theme_get() in your R window to get a list of all theme arguments in the console.

Copy and paste this example below into RStudio (theme micro-adjustments at the bottom). There is no need to type it, as that would take a long time. The purpose is just for your to see.

ggplot(data = surv,
  mapping = aes(
    x = age_years,
    y = ht_cm,
    color = gender)) +
geom_point(
  alpha = 0.7) +
scale_color_brewer(
  palette = "Pastel2",
  na.value = "grey") +
labs(
  title = "Height and age",
  subtitle = "All hospitals",
  x = "Age (years)",
  y = "Height (cm)",
  caption = "Fictional Ebola data",
  color = "Gender"
) +
theme_classic(base_size = 16) +
theme(
  legend.position = "bottom",                # move legend to bottom
  plot.title = element_text(color = "red",   # title color
                            size = 20,       # title font size
                            face = "bold"),  # title typeface
  axis.title.y = element_text(angle = 0))    # rotate y axis title to be horizontal

These theme elements follow a similar 2-part syntax much like mapping = aes() where we pass a function to an argument of a higher level function (here theme()).

Remember to add any adjustments after any pre-built themes

Some useful theme options are presented below:

theme() argument | What it adjusts ------------------------------------|------------------- plot.title = element_text() | The title plot.subtitle = element_text() | The subtitle plot.caption = element_text() | The caption (family, face, color, size, angle, vjust, hjust…) axis.title = element_text() | Axis titles (both x and y) (size, face, angle, color…) axis.title.x = element_text() | Axis title x-axis only (use .y for y-axis only) axis.text = element_text() | Axis text (both x and y) axis.text.x = element_text() | Axis text x-axis only (use .y for y-axis only) axis.ticks = element_blank() | Remove axis ticks axis.line = element_line() | Axis lines (colour, size, linetype: solid dashed dotted etc) strip.text = element_text() | Facet strip text (colour, face, size, angle…) strip.background = element_rect() | facet strip (fill, colour, size…)

The main adjustments you are likely to make regularly are to do with the plot legend.position =. Default options are "top", "bottom", "left", "right" and "none" (to hide the legend completely).

The legend position can also be set more specifically with c(x,y) where x and y refer to the position along the x or y axis as a proportion of the total length (ie. bottom right is c(1,0))

Most other theme elements can also be turned off using element_blank() e.g. to turn off minor y-axis grid lines and legend title:

Check your understanding of the basics of themes below:

quiz(
  question("Should adjustments to the theme be made before or after setting one of the prebuilt complete themes?",
    answer("before"),
    answer("after", correct = TRUE)
  ),
  question("Which of the following are prebuilt complete themes in ggplot?",
    answer("theme_bw()", correct = TRUE),
    answer("theme_classic()", correct = TRUE),
    answer("theme_red()"),
    answer("scale_color_brewer()")
  ),
  question("How would you hide a legend in ggplot?",
    answer("theme(legend.title = 'element.blank()')"),
    answer("theme(legend.position = 'right')"),
    answer("theme(legend.position(`none`))"),
    answer("theme(legend.position = 'none')", correct = TRUE)
  ),
  question("How would you set your legend to appear in the centre of your graph?",
    answer("theme(legend.position = 'middle')"),
    answer("theme(legend.position =  c(0.5,0.5))", correct = TRUE)
  )
)

End

Congratulations! You finished the entire module on ggplot! This is a difficult topic, but you now have all the essential tools to go forth and make many kinds of plots.

Extras

Saving plots

Exporting ggplots is made easy with the ggsave() function from {ggplot2}. This is written as a separate command from your ggplot() command. It can run in two ways, either:

Save your plot as an object (with a name), then run the ggsave() command specifying the plot name your desired file path:
ggsave(my_plot, "my_plot.png")
To save the last plot that was printed, run the command specifying only the desired file path:
ggsave("my_plot.png")

You can export as png, pdf, jpeg, tiff, bmp, svg, or several other file types by specifying the file extension in the file path.

You can also provide the arguments width =, height =, and units = (either “in”, “cm”, or “mm”). You can also provide dpi = with a number for plot resolution (e.g. 300). See the function details by entering ?ggsave or reading the documentation online.

Remember that you can use here() syntax to provide the desired file path.

ggsave(my_plot, here("outputs", "epiweek21", "my_plot.png"))

Pre-made themes

Other R users have created pre-made themes that you can use in your plots! For example, the {ggthemes} package contains themes that will make your plots look like they are from The Economist, The Wall Street Journal, Tufte, and even STATA!

appliedepi/introexercises documentation built on April 22, 2024, 1:01 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

appliedepi/introexercises

In appliedepi/introexercises:

Introduction to R for Applied Epidemiology and Public Health

Welcome

Data visualization

Format

Getting Help

Quiz questions

Icons

License

Learning objectives

Prepare

Prepare your script

Load packages

Run previous code

Color scales

Default color scales

Adjust fill

Built-in color scales

Continuous color scales

Axes scales

Adjusting Y-axis

Starting scales at 0

Flip axes

Date axis labels

Manual date breaks

Date axis labels

Auto-efficient date axes

Display percents

Plot labels

Static labels

Dynamic labels

Theme elements

Complete themes

Micro-adjustments to themes

End

Extras

Saving plots

Pre-made themes

R Package Documentation

Browse R Packages

We want your feedback!