In appliedepi/introexercises:

# load packages ----------------------------------------------------------------
library(introexercises)
library(learnr)
library(gradethis)
library(flair)
library(dplyr)
library(ggplot2)
library(gghighlight)
library(janitor)
library(stringr)
library(lubridate)
library(epikit)
library(fontawesome)
library(scales)
# library(RMariaDB)        # connect to sql database 

## set options for exercises and checking ---------------------------------------

## Define how exercises are evaluated 
gradethis::gradethis_setup(
  ## note: the below arguments are passed to learnr::tutorial_options
  ## set the maximum execution time limit in seconds
  exercise.timelimit = 60, 
  ## set how exercises should be checked (defaults to NULL - individually defined)
  # exercise.checker = gradethis::grade_learnr
  ## set whether to pre-evaluate exercises (so users see answers)
  exercise.eval = FALSE 
)

# ## event recorder ---------------------------------------------------------------
# ## see for details: 
# ## https://pkgs.rstudio.com/learnr/articles/publishing.html#events
# ## https://github.com/dtkaplan/submitr/blob/master/R/make_a_recorder.R
# 
# ## connect to your sql database
# sqldtbase <- dbConnect(RMariaDB::MariaDB(),
#                        user     = Sys.getenv("userid"),
#                        password = Sys.getenv("pwd"),
#                        dbname   = 'excersize_log',
#                        host     = "144.126.246.140")
# 
# 
# ## define a function to collect data 
# ## note that tutorial_id is defined in YAML
#     ## you could set the tutorial_version too (by specifying version:) but use package version instead 
# recorder_function <- function(tutorial_id, tutorial_version, user_id, event, data) {
#     
#   ## define a sql query 
#   ## first bracket defines variable names
#   ## values bracket defines what goes in each variable
#   event_log <- paste("INSERT INTO responses (
#                        tutorial_id, 
#                        tutorial_version, 
#                        date_time, 
#                        user_id, 
#                        event, 
#                        section,
#                        label, 
#                        question, 
#                        answer, 
#                        code, 
#                        correct)
#                        VALUES('", tutorial_id,  "', 
#                        '", tutorial_version, "', 
#                        '", format(Sys.time(), "%Y-%M%-%D %H:%M:%S %Z"), "',
#                        '", Sys.getenv("SHINYPROXY_PROXY_ID"), "',
#                        '", event, "',
#                        '", data$section, "',
#                        '", data$label,  "',
#                        '", paste0('"', data$question, '"'),  "',
#                        '", paste0('"', data$answer,   '"'),  "',
#                        '", paste0('"', data$code,     '"'),  "',
#                        '", data$correct, "')",
#                        sep = '')
# 
#     # Execute the query on the sqldtbase that we connected to above
#     rsInsert <- dbSendQuery(sqldtbase, event_log)
#   
# }
# 
# options(tutorial.event_recorder = recorder_function)

# hide non-exercise code chunks ------------------------------------------------
knitr::opts_chunk$set(echo = FALSE)

# data prep --------------------------------------------------------------------
surv <- rio::import(system.file("dat/surveillance_linelist_clean_20141201.rds", package = "introexercises"))

Introduction to R for Applied Epidemiology and Public Health

Welcome

Welcome to the course "Introduction to R for applied epidemiology", offered by Applied Epi - a nonprofit organisation and the leading provider of R training, support, and tools to frontline public health practitioners.

knitr::include_graphics("images/logo.png", error = F)

Data visualization

This exercise focuses on scales and themes in {ggplot2}.

Format

This exercise guides you through tasks that you should perform in RStudio on your local computer.

Getting Help

There are several ways to get help:

1) Look for the "helpers" (see below) 2) Ask your live course instructor/facilitator for help
3) Schedule a 1-on-1 call with an instructor for "Course Tutoring" 4) Post a question in Applied Epi Community

Here is what those "helpers" will look like:

r fontawesome::fa("lightbulb", fill = "gold") Click to read a hint

Here you will see a helpful hint!

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

linelist %>% 
  filter(
    age > 25,
    district == "Bolo"
  )

Here is more explanation about why the solution works.

Quiz questions

Answering quiz questions will help you to comprehend the material. The answers are not recorded.

To practice, please answer the following questions:

quiz(
  question_radio("When should I view the red 'helper' code?",
    answer("After trying to write the code myself", correct = TRUE),
    answer("Before I try coding", correct = FALSE),
    correct = "Reviewing best-practice code after trying to write yourself can help you improve",
    incorrect = "Please attempt the exercise yourself, or use the hint, before viewing the answer."
  )
)

question_numeric(
 "How anxious are you about beginning this tutorial - on a scale from 1 (least anxious) to 10 (most anxious)?",
 answer(10, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(9, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(8, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(7, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(6, message = "Ok, we will get there together", correct = T),
 answer(5, message = "Ok, we will get there together", correct = T),
 answer(4, message = "I like your confidence!", correct = T),
 answer(3, message = "I like your confidence!", correct = T),
 answer(2, message = "I like your confidence!", correct = T),
 answer(1, message = "I like your confidence!", correct = T),
 allow_retry = TRUE,
 correct = "Thanks for sharing. ",
 min = 1,
 max = 10,
 step = 1
)

License

Please email contact@appliedepi.org with questions about the use of these materials.

Learning objectives

In this exercise you will:

Practice adjusting the scales commands within {ggplot2}
Make adjustments to the themes of ggplots
Save ggplots as PNG files

Prepare

Prepare your script

Open the "ebola" RStudio project and your script "ebola_analysis.R", as usual.

Load packages

Add the following R packages to your {pacman} command at the top of your script:

{RColorBrewer} for color palettes
{viridis} for more color palettes
{scales} for formatting

Run the code to install your R packages.

Run previous code

If you left RStudio since the previous exercise, clear your Environment and then re-run all code in your script to create the clean surv dataset.

If you encounter errors, you always have the option of importing the backup clean dataset from "data/clean/backup" folder, "surveillance_linelist_clean_20141201.rds".

Color scales

ggplot2's scale commands replace the default display of the aesthetic mappings, such as:

Which colors or shapes to display
The min/max of point sizes
The min/max and frequency of axes breaks

As a generic formula, these commands are written as: scale_AESTHETIC_METHOD().

scale_ : this prefix never changes
AESTHETIC: _fill_ , _color_ , _x_ , _y_ , etc.
METHOD: _continuous(), _discrete(), _manual(), _date(), etc.

Some examples of scale commands:

You want to adjust |Scale command
----------------------|-------------------
continuous y-axis |scale_y_continuous()
date x-axis |scale_x_date()
categorical x-axis |scale_x_discrete()
fill, continuous |scale_fill_continuous()
fill, continuous |scale_fill_gradient()
color, manual assignment|scale_color_manual()

Here we show two different ways to create a continuous color gradient.

The scale_*_continuous functions work with pre-built gradient palettes
scale_*_gradient() creates a 2 color gradient
scale_*_gradient2 allows you to also set a midpoint color between these two
scale_gradient_n() allows you to create more complex palettes.

More information on these functions is available here.

Default color scales

Try out the following code in your "Simple plots" section:

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar()

Above, the fill of a bar plot uses the default colors and axis breaks.

We can adjust these aspects by adding a scale function (+) to the end of our plotting command.

Adjust fill

Below, we update the command to adjust the fill of the bars manually, using scale_fill_manual().

Our dataset includes two values for sex ("male" and "female"), and we assign the desired fills for each within a vector c(). To assign a fill for NA values we specify this with the separate argument na.value =.

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_manual(        
  values = c(
   "male" = "violetred", 
   "female" = "aquamarine"),
   na.value = "green")

Here we have chosen some ugly colors to showcase what we are changing!

Try changing the color for "male" to "dodgerblue" and "female" to "tomato" for a nicer color combination. Also set NA to be "grey".

Note that the character values you put in the vector c() need to match the values in the data exactly (e.g. "Male" is NOT the same as "male").

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_manual(        
  values = c(
    "male" = "dodgerblue",
    "female" = "tomato"),
  na.value = "grey")

Built-in color scales

{ggplot2} and the package {RColorBrewer} offers a number of pre-configured color palettes that are continuous, discrete, diverging, etc.

As we are working here with discrete data, we can use the function scale_fill_brewer() to access the following palettes rather than specifying our own colors:

This command displays the color palettes in the package {RColorBrewer}, and their shorthand codes (e.gg. "BuGn" for Blue-to-Green:

RColorBrewer::display.brewer.all()

As we are working with a discrete scale, the middle group of color palettes are most appropriate. Below we select the palette "Pastel2", and specify that missing values should be "grey".

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_brewer(
  palette = "Pastel2",
  na.value = "grey")

A color-blind friendly palette is available as well, from the {viridis} package. This comes in discrete and continuous forms with scale_fill_viridis_d() and scale_fill_viridis_c():

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey")

Adjust both the ggplot command above to use age_cat instead of sex.

How do the brewer and viridis_d scales look with more categories?

quiz(caption = "Quiz - color brewer",
  question("What color is the age group 10-19 using scale_fill_brewer(palette = 'Pastel2')?",
    allow_retry = T,
    answer("green"),
    answer("yellow"),
    answer("orange", correct = T),
    answer("brown")
  ),
  question("What is the importance of the 'fill' in the commands?",
    allow_retry = T,
    answer("It instructs the plot to be filled with joy"),
    answer("Because this is a bar plot, fill is the aesthetic that produces the colors in the bars", correct = T),
    answer("It sets the background color (grey)"),
    answer("It adjusts the spaces between the bars")
  )
)

Continuous color scales

Apply what you have learned to add a continuous viridis palette to the following plot.

Be aware that here we are dealing with a color rather than fill aesthetic because we use geom_point(). It is best practice to specify an na.value = (e.g. "grey").

ggplot(
  data = surv,
  mapping = aes(
    x = age_years,
    y = wt_kg,
    color = temp)) +
geom_point()

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = age_years,
    y = wt_kg,
    color = temp)) +
geom_point() +
scale_color_viridis_c(na.value = "grey")

Here are some further resources for you:

Viridis (try with option = "plasma" or "inferno"), and colorbrewer website (great for identifying HEX color codes!) palette functions can be added to any ggplot.

Axes scales

We can edit axes in a similar way, with similar commands.

Adjusting Y-axis

In a barplot such as the one below, we have a continuous Y-axis and discrete X-axis. Here we might decide that the counts on the Y-axis are not descriptive enough, so we wish to supply our own break points.

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey")

In scale_y_continuous() we adjust the Y-axis breaks using seq() to define a numeric sequence.

Try running the command seq(from = 0, to = 250, by = 25) in the R Console, just to see the result. Try it again with different argument values.

Now, add the function scale_y_continuous() to the above plot. Inside this function, set the argument breaks = to seq() as written above.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(
  breaks = seq(
    from = 0,
    to = 250,
    by = 25))

Starting scales at 0

You may have noticed that {ggplot2} has a behavior of expanding your axis beyond the data, with a gap between the values and the axis at the bottom. This can be fixed with scale commands for the X and Y axes, using their expand = argument.

Update the previous ggplot() command to include a second argument to scale_y_continuous() that is expand = c(0,0).

This tells the plot to start the Y-axis at the plot coordinates (0,0) with no buffer space.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(breaks = seq(from = 0,
                                to = 250,
                                by = 25),
                   expand = c(0,0))

Now, try applying the same expand = c(0,0) syntax to the discrete x-axis, by adding scale_x_discrete():

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(
  breaks = seq(
    from = 0,
    to = 250,
    by = 25),
  expand = c(0,0)) +
scale_x_discrete(expand = c(0,0))

Flip axes

Finally, flip the X and Y axes by adding coord_flip() (with empty parentheses).

This is useful in bar charts so that more discrete value names can be displayed without overlapping each other on the x-axis.

ggplot(
  data = surv,
  mapping = aes(
    x = district,
    fill = sex)) +
geom_bar() +
scale_fill_viridis_d(na.value = "grey") +
scale_y_continuous(
  breaks = seq(
    from = 0,
    to = 250,
    by = 25),
  expand = c(0,0)) +
scale_x_discrete(expand = c(0,0))+
coord_flip()

Note that if adjusting the labels with labs() you still edit x = to adjust the label that now appears on the y-axis.

Date axis labels

Date axes also have scales that can be adjusted with scale functions.

The default scale for date axis labels will vary by the range of your data. Here is an example plot:

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram()

Let's try adjusting the date axis labels with scale_x_date().

Manual date breaks

Within scale_x_date(), you can use the argument date_breaks = to provide values like "1 week", "2 weeks", or "3 months".

Note: these are breaks for the axis labels. This does not impact the bins of the histogram (the bar widths). We will discuss best practices for setting the binwidths of histograms for epidemic curves in a subsequent module.

Try editing this code so that the date labels appear every 2 months:

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram()

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram() +
scale_x_date(date_breaks = "2 months")

quiz(caption = "Quiz - date breaks",
  question("What is the default date format displayed when setting 'date_breaks =' ?",
    allow_retry = T,
    answer("MM/DD/YYYY"),
    answer("DD/MM/YYYY"),
    answer("YYYY-MM-DD", correct = T),
    answer("YYYY-DD-MM")
  )
)

Date axis labels

You (or your supervisor) may not like the date labels appearing as YYYY-MM-DD.

You can specify the format of the date labels with the argument date_labels =.

This argument accepts a character value (written within quotes) constructed using "strptime" syntax - click here for more information

For example: the strptime format "%d %b %Y" will change the display of the date labels to DD MMM YYYY (note spaces instead of dashes). You can also trigger a new line with \n, e.g. to move the year below the day and month.

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram() +
scale_x_date(
  date_breaks = "2 months",     # specify date label interval
  date_labels = "%d %b \n %Y" ) # specify how date labels appear

Here is the complete list of strptime abbreviations:

%d = Day number of month (5, 17, 28, etc.)
%j = Day number of the year (Julian day 001-366)
%a = Abbreviated weekday (Mon, Tue, Wed, etc.)
%A = Full weekday (Monday, Tuesday, etc.)
%w = Weekday number (0-6, Sunday is 0)
%u = Weekday number (1-7, Monday is 1)
%W = Week number (00-53, Monday is week start)
%U = Week number (01-53, Sunday is week start)
%m = Month number (e.g. 01, 02, 03, 04)
%b = Abbreviated month (Jan, Feb, etc.)
%B = Full month (January, February, etc.)
%y = 2-digit year (e.g. 89)
%Y = 4-digit year (e.g. 1989)
%h = hours (24-hr clock)
%m = minutes
%s = seconds
%z = offset from GMT
%Z = Time zone (character)

See Epi R Handbook Epicurves and Strings pages for more tips

Auto-efficient date axes

There is also a simple tool date labels using the {scales} package.

Confusingly, this approach is applied using the labels = rather than date_labels = argument, but it is worth it!

Assign labels = to label_date_short(). Be sure to write the empty parentheses. This produces date axis labels that automatically show the least amount of information necessary to convey changes in month, year, etc. It is very nice!

You can read the {scales} package documentation here or in your Help pane. It has many useful functions.

ggplot(
  data = surv,
  mapping = aes(x = date_onset)) +
geom_histogram() +
scale_x_date(
  date_breaks = "2 months",     # 2-month interval for date labels
  labels = label_date_short() ) # auto-efficient date labels

Adjust the date_breaks = value to "2 weeks".

What happens and how does label_date_short() adjust to account for this?

Display percents

The {scales} package has another useful function, percent(), that can fluidly adjust axes to display percents, even though in the data the values are decimal proportions.

In contrast, if you were to attempt to modify the underlying values to display a character "%" symbol, then your values would become characters and not be numeric! As you know, the character "36" is different from the number 36. On an axis, character values will not behave intuitively.

Thankfully, the percent() function accepts decimal proportions and displays them as percents, retaining their numeric properties. It can be used within scale_y_continuous() in a plotting command.
To test this, create this dataset below, using group_by() and summarise(), that creates the weekly proportion of cases that have more than 7 days delay between date of symptom onset and their report date.

delay_1wk <- surv %>%                                      # begin with surveillance linelist
  mutate(diff_1wk = as.numeric(diff) > 7) %>%              # create column that is TRUE is diff is greater than 7
  group_by(week = floor_date(date_report, "week")) %>%     # create column "week" and group by it  
  summarise(                                               # begin summarise command     
    cases = n(),                                           # number of cases in the week
    delayed = sum(diff_1wk == TRUE, na.rm=T),              # number of delayed cases in the week 
    delayed_pct = delayed / cases)                         # calculate proportion

This new summary dataset looks like this:

delay_1wk %>% 
  knitr::kable()

Write a ggplot command using geom_line() that has the following settings:

Uses the delay_1wk dataset
week is on the X-axis
delayed_pct on the Y-axis
size = 2 and color = "brown" within geom_line()

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(
  data = delay_1wk,
  mapping = aes(
    x = week,
    y = delayed_pct))+
geom_line(
  size = 2,
  color = "brown")

quiz(caption = "Quiz - Y-axis percent scale",
  question("Which scale_*() command should you use to adjust this y-axis?",
    allow_retry = T,
    answer("scale_x_discrete()"),
    answer("scale_fill_continuous()"),
    answer("scale_y_continuous()", correct = T),
    answer("scale_color_discrete()")
  )
)

Now, apply the appropriate scale function, and include the argument labels = percent.

Note that when setting this argument equal to a function, you do not need to include the parentheses at the end of percent().

ggplot(
  data = delay_1wk,
  mapping = aes(
    x = week,
    y = delayed_pct))+
geom_line(
  size = 2,
  color = "brown")+
scale_y_continuous(labels = percent)

Plot labels

Static labels

Let us continue using the data frame and plot from the previous section:

ggplot(
  data = delay_1wk,
  mapping = aes(
    x = week,
    y = delayed_pct))+
geom_line(
  size = 2,
  color = "brown")+
scale_y_continuous(labels = percent)

Add a caption that reads:

"n = 663. Report produced on 2022-04-02. Data collected from 5 major hospitals in the epidemic-affected area. Last reported case on 2014-12-21. 7 cases missing date of onset."

But! - Make each sentence start on a new line.

r fontawesome::fa("lightbulb", fill = "gold") Click to read a hint

Add a labs() function and use the caption = argument within the labs() function to add a caption. Within quotes, place "\n" in each place where you want a newline to appear.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(data = delay_1wk, mapping = aes(x = week, y = delayed_pct))+
  geom_line(size = 2, color = "brown")+
  labs(caption = "n = 663.\nReport produced on 2022-04-02.\nData collected from 5 major hospitals in the epidemic-affected area.\nLast reported case on 2014-12-21.\n7 cases missing date of onset.")

Dynamic labels

This caption as written will work... for this particular moment and dataset. But what happens if you get an updated dataset? The caption is static and will remain the same.

The {stringr} package contains the function str_glue(), which allows us to embed code within character text, so that values will update when the plot is re-run with new data.

To use this function, first, wrap the character string within str_glue(), like this: str_glue("n = 663"). This will print that n is the static number 663.

To make it dynamically reflect the number of rows in the data frame surv, you can insert curly brackets within the quotation marks. Within the brackets, insert your R code, for example, the function nrow()

str_glue("n = {nrow(surv)}") (Try running this in your R console)

Notice this will print out as n = to the number of rows in surv, rather than verbatim n = {nrow(surv)}.

As you can imagine, str_glue() is very valuable function, as it allows us to easily refernece information from our surv dataframe without having to know or update the exact value in our script any time there are changes to our data. This is especially useful when we have data that is updating on a daily, weekly, or monthly basis, such as a case linelist when we are tracking an on-going outbreak.

Within the quotation marks in the str_glue() function, you can also continue your writing, and even include other sections of code:

str_glue("n = {nrow(surv)} confirmed cases.) (Try running this directly in your R console, or in the "Testing area" section of your script)

We can even expand this further to idenitfy, for example, the number of columns in our dataframe usign the ncol() function.

str_glue("n = {nrow(surv)} confirmed cases. There are {ncol(surv)} columns in the data frame") (Try running this directly in your R console, or in the "Testing area" section of your script)

Now you see the power of this... there are some other functions to help you craft an excellent caption:

unique() A {base} R function that returns the of unique values, such as unique(surv$district)
- Combine this with length() to return the number of unique values: length(unique(surv$district))
Sys.Date() Returns the current time as per your computer. Do not put anything in the parentheses.
fmt_count() is a function from the package {epikit} that if provided a data frame and logical criteria, will return a nicely formatted statement of the number of observations. For example:
- fmt_count(surv, is.na(hospital)) (Try running this directly in your R console, or in the "Testing area" section of your script)

Now that you have these tools, revise your 4-sentence caption so that the numbers will all automatically update. We can change the number above to be represented by {nrow(surv)}, We can also change the date of creation of our report to {Sys.Date()}. Think about which other aspects of the caption we can change to be created from functions, and see the solution below.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

ggplot(data = delay_1wk, mapping = aes(x = week, y = delayed_pct))+
  geom_line(size = 2, color = "brown")+
  labs(caption = str_glue("n = {nrow(surv)}.\nReport produced on {Sys.Date()}\nData collected from {length(unique(surv$hospital))-2} major hospitals in the epidemic-affected area.\nLast reported case on {max(surv$date_report, na.rm = TRUE)}.\n{fmt_count(surv, is.na(date_report))} cases missing date of onset and not shown."))

Notice that unique() will also count the values NA and "Other" for our hospital counts. As such, you may want to subtract 2 from that calculation to ensure you are reporting only the major hospitals.

Once your captions get very complex, you can arrange the str_glue() in a different way, so that it is easier to read and manage. We will discuss this in the next module as well. In brief, the code is separated and placed towards the bottom of the function, with placeholders in the text itself. See this section of the Epi R Handbook if you are interested.

Theme elements

Themes are non-data design features of the plot (the background, the text size and color, etc).

Complete themes

{ggplot2} allows you to easily add "complete themes" that transform many aspects of a plot with one simple function.

theme_bw()
theme_classic()
theme_dark()
theme_gray()
theme_minimal()
theme_light()
theme_void()

Try adding some of these to your former plots.

Try the argument base_size = 16 inside the theme function, to quickly adjust text sizes.

Which theme do you prefer? Mentally bookmark it to use later!

Micro-adjustments to themes

Micro-adjustments to the theme can be made with theme().

We will not go into great detail here, as these adjustments are typically for small layout and visual details. More information is available here in the Epi R Handbook.

The syntax for themes takes time to learn and is not used often enough to commit to memory. See this list of feature-specific arguments. Or run, theme_get() in your R window to get a list of all theme arguments in the console.

Copy and paste this example below into RStudio console (the theme micro-adjustments are at the bottom).

There is no need to type this out, as that would take a long time. The purpose is just for you to see.

ggplot(data = surv,
  mapping = aes(
    x = age_years,
    y = ht_cm,
    color = sex)) +
geom_point(
  alpha = 0.7) +
scale_color_brewer(
  palette = "Pastel2",
  na.value = "grey") +
labs(
  title = "Height and age",
  subtitle = "All hospitals",
  x = "Age (years)",
  y = "Height (cm)",
  caption = "Fictional Ebola data",
  color = "sex"
) +
theme_classic(base_size = 16) +
theme(
  legend.position = "bottom",                # move legend to bottom
  plot.title = element_text(color = "red",   # title color
                            size = 20,       # title font size
                            face = "bold"),  # title typeface
  axis.title.y = element_text(angle = 0))    # rotate y axis title to be horizontal

Most theme elements follow a similar 2-part syntax much like mapping = aes(), where you provide a function (element_text()) as an argument to another function.

Remember to add any theme() adjustments after any pre-built themes

Some useful theme adjustments are presented below:

theme() argument | What it adjusts ------------------------------------|------------------- plot.title = element_text() | The title plot.subtitle = element_text() | The subtitle plot.caption = element_text() | The caption (family, face, color, size, angle, vjust, hjust…) axis.title = element_text() | Axis titles (both x and y) (size, face, angle, color…) axis.title.x = element_text() | Axis title x-axis only (use .y for y-axis only) axis.text = element_text() | Axis text (both x and y) axis.text.x = element_text() | Axis text x-axis only (use .y for y-axis only) axis.ticks = element_blank() | Remove axis ticks axis.line = element_line() | Axis lines (colour, size, linetype: solid dashed dotted etc) strip.text = element_text() | Facet strip text (colour, face, size, angle…) strip.background = element_rect() | facet strip (fill, colour, size…)

One adjustments you are likely to make frequently is to adjust the plot's legend.position =. Options are "top", "bottom", "left", "right" and "none" (to hide the legend completely). The legend position can also be set more specifically with c(x,y) where x and y refer to the position along the x or y axis as a proportion of the total length (ie. bottom right is c(1,0))

Most other theme elements can also be removed using element_blank() e.g. to turn off minor y-axis grid lines and legend title.

Check your understanding of the basics of themes below:

quiz(
  question("Should adjustments to the theme be made before or after setting one of the prebuilt complete themes?",
    answer("before"),
    answer("after", correct = TRUE)
  ),
  question("Which of the following are prebuilt complete themes in ggplot?",
    answer("theme_bw()", correct = TRUE),
    answer("theme_classic()", correct = TRUE),
    answer("theme_red()"),
    answer("scale_color_brewer()")
  ),
  question("How would you hide a legend in ggplot?",
    answer("theme(legend.title = 'element.blank()')"),
    answer("theme(legend.position = 'right')"),
    answer("theme(legend.position(`none`))"),
    answer("theme(legend.position = 'none')", correct = TRUE)
  ),
  question("How would you set your legend to appear in the centre of your graph?",
    answer("theme(legend.position = 'middle')"),
    answer("theme(legend.position =  c(0.5,0.5))", correct = TRUE)
  )
)

End

Congratulations! You finished the entire module on ggplot! This is a difficult topic, but you now have all the essential tools to go forth and make many kinds of plots.

Extras

Saving plots

Exporting ggplots is made easy with the ggsave() function from {ggplot2}. This is written as a separate command from your ggplot() command. It can run in two ways, either:

Save your plot as an object (with a name), then run the ggsave() command specifying the plot name your desired file path:
ggsave(my_plot, "my_plot.png")
To save the last plot that was printed, run the command specifying only the desired file path:
ggsave("my_plot.png")

You can export as png, pdf, jpeg, tiff, bmp, svg, or several other file types by specifying the file extension in the file path.

You can also provide the arguments width =, height =, and units = (either “in”, “cm”, or “mm”). You can also provide dpi = with a number for plot resolution (e.g. 300). See the function details by entering ?ggsave or reading the documentation online.

Remember that you can use here() syntax to provide the desired file path.

ggsave(my_plot, here("outputs", "epiweek21", "my_plot.png"))

Pre-made themes

Other R users have created pre-made themes that you can use in your plots! For example, the {ggthemes} package contains themes that will make your plots look like they are from The Economist, The Wall Street Journal, Tufte, and even STATA!

appliedepi/introexercises documentation built on April 22, 2024, 1:01 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

appliedepi/introexercises

In appliedepi/introexercises:

Introduction to R for Applied Epidemiology and Public Health

Welcome

Data visualization

Format

Getting Help

Quiz questions

License

Learning objectives

Prepare

Prepare your script

Load packages

Run previous code

Color scales

Default color scales

Adjust fill

Built-in color scales

Continuous color scales

Axes scales

Adjusting Y-axis

Starting scales at 0

Flip axes

Date axis labels

Manual date breaks

Date axis labels

Auto-efficient date axes

Display percents

Plot labels

Static labels

Dynamic labels

Theme elements

Complete themes

Micro-adjustments to themes

End

Extras

Saving plots

Pre-made themes

R Package Documentation

Browse R Packages

We want your feedback!