In appliedepi/introexercises:

# load packages ----------------------------------------------------------------
library(introexercises)
library(learnr)
library(gradethis)
library(dplyr)
library(flair)
library(ggplot2)
library(stringr)
library(epikit)
library(lubridate)
library(fontawesome)
library(janitor)
# library(RMariaDB)        # connect to sql database 

## set options for exercises and checking ---------------------------------------

## Define how exercises are evaluated 
gradethis::gradethis_setup(
  ## note: the below arguments are passed to learnr::tutorial_options
  ## set the maximum execution time limit in seconds
  exercise.timelimit = 60, 
  ## set how exercises should be checked (defaults to NULL - individually defined)
  # exercise.checker = gradethis::grade_learnr
  ## set whether to pre-evaluate exercises (so users see answers)
  exercise.eval = FALSE 
)

# ## event recorder ---------------------------------------------------------------
# ## see for details: 
# ## https://pkgs.rstudio.com/learnr/articles/publishing.html#events
# ## https://github.com/dtkaplan/submitr/blob/master/R/make_a_recorder.R
# 
# ## connect to your sql database
# sqldtbase <- dbConnect(RMariaDB::MariaDB(),
#                        user     = Sys.getenv("userid"),
#                        password = Sys.getenv("pwd"),
#                        dbname   = 'excersize_log',
#                        host     = "144.126.246.140")
# 
# 
# ## define a function to collect data 
# ## note that tutorial_id is defined in YAML
#     ## you could set the tutorial_version too (by specifying version:) but use package version instead 
# recorder_function <- function(tutorial_id, tutorial_version, user_id, event, data) {
#     
#   ## define a sql query 
#   ## first bracket defines variable names
#   ## values bracket defines what goes in each variable
#   event_log <- paste("INSERT INTO responses (
#                        tutorial_id, 
#                        tutorial_version, 
#                        date_time, 
#                        user_id, 
#                        event, 
#                        section,
#                        label, 
#                        question, 
#                        answer, 
#                        code, 
#                        correct)
#                        VALUES('", tutorial_id,  "', 
#                        '", tutorial_version, "', 
#                        '", format(Sys.time(), "%Y-%M%-%D %H:%M:%S %Z"), "',
#                        '", Sys.getenv("SHINYPROXY_PROXY_ID"), "',
#                        '", event, "',
#                        '", data$section, "',
#                        '", data$label,  "',
#                        '", paste0('"', data$question, '"'),  "',
#                        '", paste0('"', data$answer,   '"'),  "',
#                        '", paste0('"', data$code,     '"'),  "',
#                        '", data$correct, "')",
#                        sep = '')
# 
#     # Execute the query on the sqldtbase that we connected to above
#     rsInsert <- dbSendQuery(sqldtbase, event_log)
#   
# }
# 
# options(tutorial.event_recorder = recorder_function)

# hide non-exercise code chunks ------------------------------------------------
knitr::opts_chunk$set(echo = FALSE)

# data prep --------------------------------------------------------------------
surv <- rio::import(system.file("dat/surveillance_linelist_clean_20141201.rds", package = "introexercises"))

Introduction to R for Applied Epidemiology and Public Health

Welcome

Welcome to the course "Introduction to R for applied epidemiology", offered by Applied Epi - a nonprofit organisation and the leading provider of R training, support, and tools to frontline public health practitioners.

knitr::include_graphics("images/logo.png", error = F)

R Markdown

This exercise focuses on creating automated reports with R Markdown. Specifically, you will create an Ebola "situation report" using the code you wrote in the previous modules.

Format

This exercise guides you through tasks that you should perform in RStudio on your local computer.

Getting Help

There are several ways to get help:

1) Look for the "helpers" (see below) 2) Ask your live course instructor/facilitator for help
3) Schedule a 1-on-1 call with an instructor for "Course Tutoring" 4) Post a question in Applied Epi Community

Here is what those "helpers" will look like:

r fontawesome::fa("lightbulb", fill = "gold") Click to read a hint

Here you will see a helpful hint!

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

linelist %>% 
  filter(
    age > 25,
    district == "Bolo"
  )

Here is more explanation about why the solution works.

Quiz questions

Answering quiz questions will help you to comprehend the material. The answers are not recorded.

To practice, please answer the following questions:

quiz(
  question_radio("When should I view the red 'helper' code?",
    answer("After trying to write the code myself", correct = TRUE),
    answer("Before I try coding", correct = FALSE),
    correct = "Reviewing best-practice code after trying to write yourself can help you improve",
    incorrect = "Please attempt the exercise yourself, or use the hint, before viewing the answer."
  )
)

question_numeric(
 "How anxious are you about beginning this tutorial - on a scale from 1 (least anxious) to 10 (most anxious)?",
 answer(10, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(9, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(8, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(7, message = "Try not to worry, we will help you succeed!", correct = T),
 answer(6, message = "Ok, we will get there together", correct = T),
 answer(5, message = "Ok, we will get there together", correct = T),
 answer(4, message = "I like your confidence!", correct = T),
 answer(3, message = "I like your confidence!", correct = T),
 answer(2, message = "I like your confidence!", correct = T),
 answer(1, message = "I like your confidence!", correct = T),
 allow_retry = TRUE,
 correct = "Thanks for sharing. ",
 min = 1,
 max = 10,
 step = 1
)

License

Please email contact@appliedepi.org with questions about the use of these materials.

Learning objectives

In this exercise you will:

Practice creating and rendering an R Markdown script
Use your existing code to create an Ebola R Markdown situation report
- An example of the output report is located in the "scripts/examples" subfolder
Adapt your R Markdown report to use dynamic text and params

Prepare

Open the R project "ebola", as usual, by clicking on the R project icon in the "ebola" folder. Open your script "ebola_analysis.R".

Create a new R Markdown script

Click to "File -> New File -> R Markdown...".

knitr::include_graphics("images/1_gettingstarted.png", error = F)

In the window that appears:

Select "Document" from the left menu
Set the Title to "Surveillance report"
Select "HTML" from the lower radio button list
Select the checkbox that says "Use current date when rendering document"
Click "OK" (you can edit the title and author later)

Once the new R Markdown script appears, click to "File -> Save As" and save it in the folder "ebola/scripts" as the name "ebola_sitrep.Rmd".

This file is a generic R Markdown template script that can be used to start making your own script. It uses datasets saved within the base R software (e.g. cars and pressure) to demonstrate basic R Markdown functionalities.

In this exercise, we will explore R Markdown functionalities and *transfer your existing R script into this R Markdown script*.

R Markdown overview

In this section, we will review the key features of an R Markdown script.

Even if you are already familiar with R Markdown, you may learn some new tips!

Open your R Markdown script.

quiz(caption = "Quiz - YAML",
  question("What differences from a traditional R script do you see in an R markdown script?",
    allow_retry = T,
    answer("The top part has colons : and dashes ---", correct = T),
    answer("There are alternating background colours to different sections within the script (e.g grey, white)", 
           correct = T),
    answer("There are many tick marks... at the top/bottom certain sections (e.g. grey have them but white don't)",
           correct = T), 
    answer("There are text sentences, but also R code",
           correct = T),
    answer("R Markdown scripts allow you to draw pictures in the script",
           correct = F)
  )
)

Buttons

r fontawesome::fa("eye", fill = "darkblue") Review the menu of new buttons appearing across the top of the R Markdown script:

Viewing modes

knitr::include_graphics("images/buttons_visual.png", error = F)

Source and Visual Modes - On the far upper-left of the R Markdown script, two buttons allow you to switch or "toggle" between "Source" and "Visual" views of the script. This is a feature of R Markdown scripts in RStudio that has become more popular just in recent years.

"Source" mode - This is the traditional R Markdown script view. You write in Markdown syntax. For example, to produce bullet points you write asterisks ( * ) and to produce static tables you use bars ( \| ).\
"Visual" mode - In this mode you see a quasi-preview of how the R Markdown report will appear when rendered. There are more buttons at the top, in a style similar to Microsoft Word, which will do the Markdown syntax formatting for you! E.g. to convert text to italics, bold, insert bullets, or even insert tables.

For example, to produce the above bullets, in Source mode we wrote the following syntax:

**Source and Visual Modes** - On the far upper-left of the R Markdown script, two buttons allow you to switch or "toggle" between "Source" and "Visual" views of the script. This is a feature of R Markdown scripts in RStudio that has become more popular just in recent years.  

  * **"Source" mode** - This is the traditional R Markdown script view. You write in Markdown syntax. For example, to produce bullet points you write asterisks ( * ) and to produce static tables you use bars ( | ).  
  * **"Visual" mode** - In this mode you see a quasi-preview of how the R Markdown report will appear when rendered. There are more buttons at the top, in a style similar to Microsoft Word, which will do the Markdown syntax formatting for you! E.g. to convert text to italics, bold, insert bullets, or even insert tables.

In contrast, if writing the above bullets using Visual Mode, you can simply type, and adjust the bullets, bold, etc using the buttons along the top.

knitr::include_graphics("images/visual_mode.png", error = F)

r fontawesome::fa("exclamation", fill = "red") Be careful! Visual mode is appealing, but most advanced R users prefer Source mode to have more control. In particular, Visual Mode handling of static tables and bullets can be quite buggy, especially if one switches back-and-forth between the modes.

Knit

knitr::include_graphics("images/buttons_knit.png", error = F)

Near the center along the top of the script is a blue "Knit" button. This button can be clicked to "render" the R Markdown's output (e.g. a Word Document, PDF, HTML, etc.).

We will practice knitting in just a moment. Note that this Knit button also has a drop-down menu on the right side, opened by clicking the downward arrow.

Insert Chunk

knitr::include_graphics("images/buttons_chunk.png", error = F)

Moving to the right, a small green button with a "+" symbol allows you to insert a "code chunk" into the R Markdown script. We will discuss this later, but a code chunk is where you can write R code that will be executed when the document is rendered.

Note the drop-down menu on the right side. In an R Markdown script you can also run code from other programming languages such as Python and SQL!

Run

knitr::include_graphics("images/buttons_run.png", error = F)

The Run button is very important - it can be used to run select parts ("code chunks") of your R Markdown script.

Click the arrow next to Run and see the drop-down menu. Review all the options. We can play with them later.

Outline

knitr::include_graphics("images/buttons_outline.png", error = F)

At the far right side, is a grey button called "Outline". Clicking this will open a Table of Contents bar that allows you to navigate quickly through your script. As your script grows, it can be helpful to navigate and ensure your headings (the indentations in the ToC) are correct!

Producing a report

Producing the report is called "knitting" the report. You can think of this process as combining, or "stitching together" components of the R Markdown script, namely:

Titles, subtitles, and other headings
Text sentences, bullet points, etc.
R code and its outputs (plots, tables, loading packages, data cleaning processes, etc.)

The result is "rendered" (produced) as output file. This could be a Word document, PDF, HTML document, etc.

Click the "knit" button at the top of the R markdown script to render the R Markdown document as an HTML file.

If no output appears or you see an error in the console, notify your instructor. On workplace computers, R markdown can sometimes encounter errors due to writing permissions. We have documented some of these at this Epi R Handbook chapter.

r fontawesome::fa("eye", fill = "darkblue") Observe the output. Note how it contains elements that correspond to components in the R Markdown script:

A title, author line, and date line
R code
Headings (e.g. "Including plots")
Text sentences (e.g. "This is an R Markdown document.")
R code outputs (the plot)
Warning messages produced by the R code

Let us refresh ourselves on these components of the R Markdown script...

YAML

Note the section at the very top of the R Markdown script - the first 6 lines. This is the "YAML" section.

As a beginner R user, you should only have one YAML section in your R Markdown script, at the very top.

The YAML header

The YAML are settings for what type of output to produce, formatting preferences, and other metadata such as document title, author, and date. These are done via key: value pairs, separated by a colon and a single space following the colon.

YAML is delimited by three dashes on its top and bottom. YAML sections can be very simple, for example:

---
title: "Surveillance report"
output: html_document
---

However, they can become much more complex... we will learn about more advanced YAML as we work through this exercise!

Output formats

You can control the output produced with the output: key in the YAML. This is where you control whether the report produces as a Word Document, PDF Document, HTML Document, dashboard, etc.

The YAML key output: can be edited to produce many types of outputs. Some common ones are below:

| value | Output type | |----------------------|---------------------------| | html_document | HTML document / basic webpage | | word_document | Word document | | ppt_presentation | Powerpoint slides | | pdf_document | PDF document (requires LaTeX) | | flexdashboard::flex_dashboard | Basic dashboard (requires the {flexdashboard} package) |

We will cover many of these in this course.

HTML

When you knitted the document the first time, the default output: html_document written in the YAML was used, producing an HTML file.

An HTML file is similar to a webpage. It can be viewed using a internet browser (even if a computer is offline) and therefore can include customized look and interactive components such as buttons, tabs, scrollable table of contents, etc. An HTML file can be emailed as a static file, similar to a PDF or Word document. When the recipient receives it, the file will open in the internet browser - but the file is not online! It is only saved locally on their computer, but the internet browser is used to display the file. This is important to know, if you are concerned about sharing sensitive information in an HTML report.

Word Document

Update the output key in your YAML section to:

output: word_document

Be very careful with the spacing and exact spelling!!

The Word output format is very useful if you want to automate analytics but to also have human interpretation written in after the report is rendered by R.

"Knit" the report with this setting, and review the output.

Note that to re-knit a Word Document you need to close the Word Document first, otherwise you will see this error:

Error message that R could not overwrite an open document:

pandoc.exe: ebola_sitrep.docx: withBinaryFile: permission denied (Permission denied)
Error: pandoc document conversion failed with error 1
Execution halted

PDF Document

Do not attempt to render to PDF right now. It may distract you from the rest of the exercise. Knitting directly to PDF can be difficult and you may encounter more challenges with formatting.

We will include instructions at the end of this exercise to help you produce a PDF.

Remember that if you have difficulty knitting to PDF from R, you can usually knit to Word and then save it as a PDF.

Other formats

Many other formats exist, for example:

Powerpoint presentations
HTML slides that can be hosted online (e.g. the slides in this course are made using the {xaringan} package in conjunction with {rmarkdown})
This tutorial was written in {rmarkdown}, with the {learnr} package
You can also write books (like the Epi R Handbook!), blogs/websites.

YAML syntax

YAML can also be more complex. See below for an example...

DO NOT UPDATE YOUR OWN YAML CODE TO REFLECT THIS EXAMPLE! Just look at it and answer the questions below.

---
title: "Surveillance report"
subtitle: "Hepatitis A"
author: "Lisa Epi"
date: "`r Sys.Date()`"
output: 
  html_document:
    theme: cerulean
    code_folding: show
    toc: yes
    toc_float:
      collapsed: true
params:
  state: Nebraska
  year: 2019
  midwest: true
  report_date: !r lubridate::ymd("2023-12-09")
---

See if you can answer the following questions about the above YAML:

quiz(caption = "Quiz - YAML",
  question("What is the default output type of this R Markdown?",
    allow_retry = T,
    answer("Word document", 
           correct = F),
    answer("Powerpoint slides",
           correct = F), 
    answer("HTML document", correct = T),
    answer("Shiny app",
           correct = F)
  ),
  question("What does the function Sys.Date() produce?",
    allow_retry = T,
    answer("The current date, as per your computer", correct = T),
    answer("The operating system being used", 
           correct = F),
    answer("The last date of symptom onset",
           correct = F), 
    answer("The last date of report of a case",
           correct = F)
  )
)

There are many optional YAML settings. In the YAML example above:

theme: is setting the colors of the buttons, background, and overall aesthetic theme of an HTML report output
code_folding: show in an HTML report, it makes the R code sections "collapsible" by click, but showing by default
toc: and toc_float: are adjusting a Table of Contents

Read about more YAML options in the Posit R Markdown cheat sheet, among other places.

YAML errors

Unfortunately, YAML is notorious for errors and confusing error messages.

The number of spaces, indentation, and placement of the colons matters!
Tabs are not accepted but spaces are.

Review the YAML below, but do NOT copy it into your script. Answer the below questions about YAML formatting (and the key: value pairs):

---
title: "Surveillance report"
subtitle: "Hepatitis A"
author: "Lisa Epi"
date: "`r Sys.Date()`"
output: 
  html_document:
    theme: cerulean
    code_folding: show
    toc: yes
    toc_float:
      collapsed: true
params:
  state: Nebraska
  year: 2019
  midwest: true
  report_date: !r lubridate::ymd("2023-12-09")
---

quiz(caption = "Quiz - YAML",
    question("Where are colons placed? (select all that apply)",
    allow_retry = T,
    answer("After a key, followed by a single space",
           correct = T),
    answer("After any embedded R code", 
           correct = F),
    answer("After a value, if followed by an indented key",
           correct = F), 
    answer("After a value, only if followed by a *more* indented key",
           correct = T)
  ),
  question("Which ways are accepted to write that a YAML key should be set to 'true'?",
    allow_retry = T,
    answer("TRUE",
           correct = T),
    answer("'TRUE' (with quotation marks)",
           correct = F, message = "Quotation marks deactivate the logical status."),
    answer("true", 
           correct = T),
    answer("True",
           correct = F), 
    answer("TrUe",
           correct = F, message = "Random capital letters in the middle of a word prevent recognition as a special term")
  ),
      question("How many spaces are used in YAML indentation?",
    allow_retry = T,
    answer("4", correct = F),
    answer("3", 
           correct = F),
    answer("2",
           correct = T), 
    answer("1",
           correct = F)
  )
)

YAML error messages are often cryptic and difficult to interpret. A few examples are below:

You forgot a space after the colon that separates the key and the value

Error in yaml::yaml.load(..., eval.expr = TRUE) : 
  Scanner error: while scanning a simple key at line 3, column 1 could not find expected ':' at line 4, column 1
Calls: <Anonymous> ... parse_yaml_front_matter -> yaml_load -> <Anonymous>
Execution halted

Incorrect indentation

Error in yaml::yaml.load(..., eval.expr = TRUE) : 
  Scanner error: mapping values are not allowed in this context at line 7, column 15
Calls: <Anonymous> ... parse_yaml_front_matter -> yaml_load -> <Anonymous>
Execution halted

In the above errors, you often see reference to "line X" and "column Y" (X and Y being numbers). Look in the YAML at line X of the YAML, and character Y on that line to identify the mistake.

r fontawesome::fa("pen", fill = "brown") The truth is - mistakes in YAML are so easy to make, most R coders do not type their YAML from scratch. We copy-paste from a script that is known to work, and make delicate edits upon it, testing by knitting when we make a small change to the YAML.

The Render pane

In RStudio, the progress of rendering (knitting) of the report, and any error messages display in the Render tab. If you get confused, remember that you can always click to return to the Console tab.

knitr::include_graphics("images/render_console.png", error = F)

What does YAML stand for, you might ask? The truth is, it stands for "Yet Another Markdown Language". Somebody had fun creating that name!

Prepare your YAML

Edit your YAML to look exactly like this (with your name in the author: section). Note that the colons and indenting must be exactly correct, with one space after a colon and two spaces for an indentation.

---
title: "Situation Report"
subtitle: "Ebola outbreak in Sierra Leone"
author: "(Your name or agency here)"
output:
  word_document: default
date: "`r Sys.Date()`"
---

Note that for this exercise we will build a Word document report.

Code chunks

Look at the whole template R Markdown script. Aside from the YAML, R Markdown scripts have two major components:

1) "Chunks" of R code
2) Sections of text ("Markdown" text)

The code in this template is divided into several code chunks:

The first "setup" chunk controls how R code and outputs are displayed in the report.
The second chunk prints a summary table of the cars dataset
The final chunk prints a plot of the pressure dataset

Note: the cars and pressure are example datasets saved within R.

We will now edit this template to include code for our Ebola situation report.

Running a code chunk

Re-knitting the report as we make small changes can be tedious, and is unnecessary. As we edit our report, we can run specific R code chunks to see the impact of changes to the R code.

Go to the chunk that contains the summary() command and press the green arrow on the upper-right corner.

knitr::include_graphics("images/chunk_play.png", error = F)

This action will run all the R code written within the chunk.

Run the next chunk, with the plot() command, the same way.

quiz(caption = "Quiz - knitting Rmd",
  question("Where did the plot appear?",
    allow_retry = T,
    answer("In the RStudio Viewer pane", correct = F),
    answer("In the RStudio Environment pane", 
           correct = F),
    answer("In the R Markdown, below the code chunk",
           correct = T), 
    answer("In the R Console",
           correct = F)
  )
)

In R Markdown, a preview of plots and other outputs appear underneath the code chunk. If there are multiple outputs from one code chunk, they will appear next to each other, clickable.

Run code line-by-line

As in a normal R script, you can highlight certain lines of R code to run.

To run the line(s) you have the following options:

1) Highlight the lines, and press Ctrl+Enter on the keyboard, or
2) Press the "Run" drop-down menu in the top-right of the script, select "Run selected lines"

Practice running the summary() and plot() lines using one of the above methods.

Creating a code chunk

Note how a code chunk begins with three "backticks" and "curly brackets" with the letter R inside, and ends with three backticks as well.

Note that backticks are NOT single quotation marks. See the difference:

`` (two back ticks)\
' ' (two single quotation marks)\
" " (two double-quotation marks)

It might take some time to find this backtick key on your keyboard. On US and UK keyboards it is often near the ESC, 1, or ~ key (for other keyboard layouts, see this guide.

Luckily, you do not need to type backticks to create a code chunk!

To easily insert a new code chunk, place your cursor in the script in the middle of some empty lines. Then, press the small green button (with a + symbol) near the top-right of the script to insert a new R code chunk!

knitr::include_graphics("images/buttons_chunk.png", error = F)

Alternatively, press Ctrl + Alt + i (or Option + Command + i on a Mac) to insert a chunk.

Think of this chunk like a miniature R script. Write R code between the backtick lines exactly as you would in a traditional R script.

You can also write R code that does not print an output. The chunks in the image below load packages and import a dataset. Note that comments can be added with # inside a chunk, just as in a normal R script.

knitr::include_graphics("images/chunk_no_output.png", error = F)

Add Ebola code chunks

Your turn! Practice adding a code chunk to the document.

Below the "setup" chunk, insert a new code chunk and paste your pacman::p_load() command from your ebola_analysis.R script.

# This chunk loads packages
pacman::p_load(
     rio,          # for importing data
     here,         # for locating files
     skimr,        # for reviewing the data
     janitor,      # for data cleaning  
     epikit,       # creating age categories
     gtsummary,    # creating tables  
     RColorBrewer, # for colour palettes
     viridis,      # for more colour palettes
     scales,       # percents in tables  
     flextable,    # for making pretty tables
     gghighlight,  # highlighting plot parts  
     ggExtra,      # special plotting functions
     tidyverse     # for data management and visualization
)

Then, add another code chunk below it, with your code to import the surv_raw dataset.

# This chunk imports data
surv_raw <- import(here("data", "raw", "surveillance_linelist_20141201.csv"))

Try running these chunks by clicking their green "play" buttons.

Do you expect any printed output below the chunks? We will discuss how to handle any warning messages soon.

Running order

Now, save your R Markdown script.

Next, go to the RStudio "Session" menu tab, and click "Restart R"

Nothing you ran before has been saved. It is a clean slate. No packages are loaded, and no datasets are in your Environment.

Now try to run only the chunk that contains the import() command.

quiz(caption = "Quiz - knitting Rmd",
  question("Where did the error message appear (select all that apply)?",
    allow_retry = T,
    answer("In the RStudio Viewer pane", correct = F),
    answer("In the RStudio Environment pane", 
           correct = F),
    answer("In the R Markdown, below the code chunk",
           correct = T), 
    answer("In the R Console",
           correct = T)
  ),
  question("Why did this error occur?",
    allow_retry = T,
    answer("Misspelling of import()", correct = F),
    answer("The rio package was not loaded before running import()", 
           correct = T),
    answer("The dataset does not exist",
           correct = F), 
    answer("R is occassionally angry and refuses to import data",
           correct = F)
  )
)

This reinforces the message that although each chunk looks independent, they still reference the same R Environment. Moreover, when the script is knit, the Environment is cleared and the chunks are run in order.

On the import() chunk, click the icon next to the green "Play" icon - if you hover it says "Run All Chunks Above". This is a useful button to know.

Run all the chunks aboe and then run the import() chunk.

Formatting the text

An R Markdown script is first and foremost producing a document. Text that you write on lines in the script will appear as text in the output - sentences, paragraphs, bullet points, etc. These words can be customised into italics, bold, static tables with cells, etc.

Later, we will show you how code can be embedded within your document - this is the power of R Markdown!

Add sections

You can include sections in the report with "headings" using # hash symbols.

# Biggest heading
## A sub-heading
### A sub-sub-heading
#### An even smaller heading

Headings will appear in the "outline" of the report. Click the "outline" button in the upper-right of the script to see this outline and the indentations by heading level.

quiz(caption = "Quiz - Headings",
  question("What are the headings in the template R Markdown script?",
    allow_retry = T,
    answer("Including plots", correct = T),
    answer("title", 
           correct = F, message = "No, title is part of the YAML."),
    answer("setup",
           correct = F, message = "No, setup is a chunk name not a heading. There is no # symbol in front of it."),
    answer("cars",
           correct = F, message = "No, cars is a chunk name not a heading. There is no # symbol in front of it."),
    answer("pressure",
           correct = F, message = "No, pressure is a chunk name not a heading. There is no # symbol in front of it."),
    answer("R Markdown",
           correct = T)
  )
)

Make a "Summary" section of your report

Add the following headings into your report, *below the import() code chunk*.

# Summary

## About this report

## Key numbers

Add text and bullet points

Formatting text in Source mode works like this:

For Bold text, surround the word(s) with two asterisks (**)
For italic text surround the word(s) with either single asterisks () or underscores (_)
For bullets, use either asterisks (*) or dashes (-) followed by a space
For sub-bullets, indent and then use either asterisks or plus symbols (+)
NOTE: For proper formatting, there must be an empty line above the first bullet. In some situations it also can help to put two spaces at the end of each bullet before pressing your Enter/Return key.

Here is an example (you do not need to type this):

knitr::include_graphics("images/2_text.png", error = F)

Now, write this text in the new sections of your report:

1) In the "Summary" section, write:

This is a demo situation report on a hypothetical outbreak of Ebola in Sierra Leone from 2014. It uses simulated data.

2) In the "About this report" section, write:

This report was produced on CURRENT_DATE.

In the place of CURRENT_DATE above, write the current date (in any format you prefer).

3) In the "Key numbers" section, write (in bullet points):

Key points about this outbreak:

In total, there have been 539 cases reported.
The first case was reported on 2014-06-16.
The last case was reported on 2014-11-27.

Pay attention to the formatting of the bullets! Re-knit to check that it works. Don't forget to put an newline between the top phrase and the first bullet, and two spaces after each bullet!

Re-knit your report and view the output Word document. Remember, if you receive "Error 1", you likely need to close the Word document before knitting.

Chunk options

Would you submit this report to a decision-maker? Surely not.

One problem is that the R code and warning messages are printed in the report alongside the outputs!

Display of the code in the report output is controlled through the echo = option for each chunk. The easiest way to control this option for all chunks is by setting a default.

Scroll up to the top of the script and see the top-most "setup" chunk with this code in it:

knitr::opts_chunk$set(echo = TRUE)

This "setup" chunk appears by default in any R Markdown template. This fuction sets the default options for all the chunks.

You saw the R code in the report because the echo = default is set to TRUE.

Change this setting to echo = FALSE and re-knit the report to see the result. You should see that the R code being run is no longer appearing ("echoing") in the report output.

Individual chunks

Even after specifying the defaults, you can still adjust chunk options for individual chunks within their curly brackets { }, after the r and after a comma.

knitr::include_graphics("images/chunk4.png", error = F)

quiz(caption = "Quiz - Test your understanding of chunk options",
  question("In the example above, what will appear in the knitted report?",
    allow_retry = T,
    answer("Only the math equations will show.", correct = F),
    answer("The math equations and their answers will show.", correct = F),
    answer("Only the answers will show", correct = T), 
    answer("Neither the equations, nor the answers will show.", correct = F)
  )
)

In the example chunk above, the option echo = is set to FALSE for this chunk only. It means that:

The 2 + 2 and 10 / 5 will not appear in the report output. Only the printed results of those R commands (4 and 2) will appear in the report.

The gear icon

If you prefer a "point-and-click" approach, set a chunk's options by clicking the small "Gear" icon on the right side of the chunk and selecting from the "Output" drop-down menu. It will set the corresponding options for the chunk.

knitr::include_graphics("images/chunk_options.png", error = F)

Try adjusting the options for the chunks. See the differences between chunks that contain a command with no output (e.g. importing data) and those with an output (e.g. the plot() command). Re-knit the report as many times as you need to understand how these options work.

quiz(caption = "Quiz - chunk options",
  question("Which option is listed if you select ‘Show code and output’?",
    allow_retry = T,
    answer("show = TRUE", correct = F),
    answer("output = TRUE", 
           correct = F),
    answer("echo = TRUE",
           correct = T), 
    answer("include = TRUE",
           correct = F)
  )
)

quiz(caption = "Quiz - chunk options",
  question("What does the option ‘include = FALSE’ do?",
    allow_retry = T,
    answer("Prevents the report from knitting", correct = F),
    answer("Makes the document non-inclusive of visually-impaired readers", 
           correct = F),
    answer("The code will be run, but no outputs will be shown",
           correct = T), 
    answer("This code is not run",
           correct = F)
  )
)

Can you edit your code now so the default is to hide all messages and warnings?

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)

The options message = and warning = are set to FALSE.

The options you select may very by the context:

For a short situation update or public document, you would likely not "echo" the R code into the report.\
For an analytic report with a technical audience, you might make the code available so that other scientists or readers can quickly inspect your process.

Before proceeding, ensure that your setup chunk (at the top of the script) looks like this, and no other chunks have different settings:

knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)

Chunk names

There is an optional "naming" feature for code chunks. We encourage R Markdown beginners to not use this feature because it can lead to frequent errors if you are not careful.

At the end of this exercise we explain more about this functionality.

Build your report

Now that you know about YAML, knitting, code chunks, and text... begin to make your Ebola situation report!

Remove all the remaining example headings, text and chunks from the R Markdown template (e.g. the plot() and summary() commands that use the cars and pressure data).

Be sure to keep the YAML, the setup chunk, the outbreak "Summary" headings and text you have added, and the chunks you created for loading packages and importing the raw linelist.

Transfer the important R code from your "ebola_analysis.R" script into this R Markdown script. A few tips:

Use chunks to organise your code - do not put ALL the code into one chunk.
Test your progress often by knitting your report after adding a chunk
Keep your # comments in the code chunks to inform readers
Be aware of which dataset you are working on (e.g. surv_raw or surv) and the ordering of the chunks

Follow this order for your code chunks, where each number represents a code chunk:

1) Setup (chunk settings) 2) Load packages
3) Import the raw surveillance linelist 4) (optional) Exploratory analysis (set eval = FALSE in the options for this chunk, so that no outputs print to the final report)
5) Clean and export the surveillance linelist
6) Create and print descriptive tables (put each table in its own chunk)
7) Create and print plots (put each plot in its own chunk)
8) (optional) Testing chunk (set eval = FALSE so no outputs print to the report)

For the plots and tables, choose just 3-5 tables and plots from your ebola R script to highlight in this report. Give them appropriate headings/subheadings.

Do not include all of the exploratory analysis or testing code from your "ebola_analysis.R" script. Keep the report relatively minimal.

Ask your facilitator for assistance if you get stuck!

Also, do not forget to save your R Markdown report as you work!

Inline code

The text you wrote in the "Summary" section is static. It will not change when new cases are added to the linelist and the report is re-run (unless you make manual changes in the R Markdown script).

R Markdown allows you to embed R code into text, so that numbers automatically update.

By writing "inline code", you create a small segment of R code within some text:

1) Start the code with a single backtick and the letter r and a space.
2) Continue by writing your R code.
3) Finish the inline code portion with another single backtick.

r fontawesome::fa("exclamation", fill = "red") Remember that a "backtick" is not the same as a single quote/speech mark ('). If you are not sure where to find a backtick on your keyboard, ask an instructor. Sometimes it is on the same key as a ~ tilde.

Today's date

A simple inline code example is to insert the current date into a sentence, such that it updates automatically each time the report is knit.

The function Sys.Date() prints the current date (note the capital "D"). This is from {base} R and does not rely on any objects, so it can be placed anywhere in the script.

Update the sentence in the section "About this report" so that is reads:

### About this report
This report was produced on `r Sys.Date()`.

Now, re-knit the report and see if the correct date appeared in the sentence.

Number of rows in the linelist

In the "Key numbers" section, the first bullet includes the number of cases (rows). How will you use inline code so that this number updates automatically?

r fontawesome::fa("lightbulb", fill = "gold") Click to read a hint

Use the base R function nrow() to print the number of rows in the surv linelist.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

* In total, there have been `r nrow(surv)` cases reported.

Did you see an error message like this?

knitr::include_graphics("images/execution_halted.png", error = F)

Click the "Issues" button to see the detailed error message and line(s) that caused the problem.

knitr::include_graphics("images/render_issues.png", error = F)

r fontawesome::fa("exclamation", fill = "red") Think for a moment - the R Markdown script renders from the top to the bottom. So, WHERE in the R Markdown script can you run this code that references the *clean surv dataset*?

quiz(caption = "Quiz - Order",
  question("What should you do to fix this problem?",
    allow_retry = T,
    answer("Not have a Summary section of this report", correct = F),
    answer("Manually adjust the Summary section in the R Markdown each time",
           correct = F), 
    answer("Edit the Summary section in the Word output each time",
           correct = F),
    answer("Move the Summary section below the data cleaning chunk", 
           correct = T)
  )
)

This inline code must be placed below the cleaning chunk! Otherwise, the code will not be able to find the clean surv object.

Move the entire "Summary" section below the data cleaning R code chunk. Since all the chunks are set to echo = FALSE, the Summary will still appear at the top of the Word output.

The new order of your script should be:

1) Setup (chunk settings) 2) Load packages
3) Import the raw surveillance linelist 4) Exploratory analysis (optional)
4) Clean and export the surveillance linelist
5) Summary of the outbreak
6) Create and print descriptive tables
7) Create and print plots

Ensure that this code works by re-knitting the report.

Max and min

Convert the second and third "Key Numbers" bullets to show the minimum and maximum values from the date_report column in the surv data frame.

r fontawesome::fa("lightbulb", fill = "gold") Click to read a hint

Use the base R functions max() and min(). Reference the column as surv$date_report
Don't forget the argument na.rm = TRUE to ensure that any missing dates of report do not make the output NA.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

* The first case was reported on `r min(surv$date_report, na.rm=T)`.   
* The last case was reported on `r max(surv$date_report, na.rm=T)`.

Return a specific number of cases and their percent

In the outbreak summary, we want to highlight the number of children under 5 years that are confirmed cases.

How would you add text to the first bullet that reports the number of cases under age 5 years?

Here are some options:

1) Option 1: You could write a longer R command in the inline code, using pipes, filter(), and nrow(), such as:

* In total, there have been `r nrow(surv)` cases reported, including `r surv %>% filter(age_years < 5) %>% nrow()` children under 5 years.

2) Option 2: You could create a separate R code chunk above, to create and store the number for later use:

under_5_count <- surv %>%
  filter(age_years < 5) %>%
  nrow()

Then in the bullet point, reference the value within the inline R code, such as:

* In total, there have been `r nrow(surv)` cases reported, including `r under_5_count` children under 5 years.

3) Option 3 (recommended): For this specific scenario, we suggest using the helpful function called fmt_count() developed by some of our Applied Epi team members. It is in the {epikit} package. It will return the number AND the percentage of these rows, nicely formatted!

fmt_count() # from the {epikit} package is built for exactly this scenario

Ensure the {epikit} package is in your pacman::p_load() command, and that it is loaded for use. Now, search for "fmt_count" in the RStudio Help pane and read about how it works. Use it in your inline code.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

Here is your markdown text, with the inline R code:

* In total, there have been `r nrow(surv)` cases reported, including `r fmt_count(surv, age_years < 5)` children under 5.

Re-knit the report, and you will see that the number of these cases, and their percent of the total rows, have been printed automatically - how neat!

The `pull()` function

What if you want to add a bullet point with the number of hospitals which have reported cases?

Add a static new (fourth) bullet point:

* There are X hospitals that have reported cases.

If you are not already familiar with it, explore the tidyverse function pull(). This extracts a single column from a data frame (like a tidyverse version of the $ operator).

Add a new code chunk to the "Key numbers" section, just below its heading. You can do this with (Ctrl+Alt+i), or the green insert chunk button.

In this chunk, create an object that stores the number of affected hospitals, called hosp_num. Use piping, pull() and the base R functions unique(), na.omit(), and length() to achieve this. Attempt this yourself, then view the solution.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

# extract the hospital column, reduce to the unique values, then return the length of that vector (i.e. the number of unique hospital names)
hosp_num <- surv %>% # begin with the surv dataset
  pull(hospital) %>%  # extract only the hospital column
  unique() %>%        # use the unique values
  na.omit() %>%       # remove NA
  length()            # return only the number

Explore how each line is changing the hosp_num output, by highlighting and running the command to include each subsequent line (add one line at a time). Remember that you can highlight and run specific lines in a code chunk using Ctrl and Enter.

Now, update the fourth bullet text with inline code so that the number of hospitals is presented dynamically

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

* There are `r hosp_num` hospitals that have reported cases.

paste()

How would you extend the last bullet point to include the names of the reporting hospitals?

Use the functions taught above to extract a vector of unique hospital names and save it as hosp_names.

Then, in the bullet text, within the in-line code, use the paste() function from base R and its collapse = argument (recall the jurisdictions exercise from Module 1!) to print the names.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

# number of hospitals
hosp_num <- surv %>% 
  pull(hospital) %>% 
  unique() %>%
  na.omit() %>% 
  length()

# names of the hospitals
hosp_names <- surv %>% 
  pull(hospital) %>% 
  unique() %>% 
  na.omit()           # remove NA from the vector

Then, in the bullet text:

* There are `r hosp_num` hospitals that have reported cases: `r paste(hosp_names, collapse = ", ")`.

Some advanced tips for this type of reporting:

As an alternative to na.omit(), convert NA to a word such as "Unknown" within mutate() using the function fct_na_value_to_level() from {forcats}
Use fct_infreq() from {forcats} to convert the column to a "factor" class - this sets a specific intrinsic order to the values, in this case by their frequency. We will discuss factors more in later modules.

The advanced code below stores the hospital names in order of their frequency in the data.

# names of the hospitals in order of frequency
hosp_names <- surv %>% 
  mutate(hospital = fct_infreq(hospital)) %>%  # convert to factor and order by frequency
  mutate(hospital = fct_na_value_to_level(hospital, "Unknown")) %>% # convert NA to "Unknown"
  pull(hospital) %>%  # extract the column only
  levels()   # extract the levels of the factor

Dynamic text within an R command

As discussed above, inline code is used to place short R commands within normal markdown text.

But what about combining text and code together within the output of an R command, such as a table or plot title? In the ggplot module we briefly introduced the str_glue() function from the {stringr} package to do this.

Note the difference:

Inline R code - Used in markdown text (not in a code chunk)
str_glue() - Used within an R command (e.g. in a code chunk)

As the name suggests, str_glue() helps "glue" together character strings (text) with R code within the context of an R command. It is often used for creating dynamic plot captions as part of a ggplot() command.

Things to remember with str_glue():

The entire phrase is written between double quotation marks within the function, like str_glue(" TEXT AND CODE HERE ")
Any code or pre-defined values are placed within curly brackets { } within the double quotation marks. There can be many curly brackets in the same str_glue() command.

A simple example, of a dynamic plot caption, is below:

ggplot(
  data = surv,
  mapping = aes(x = date_onset))+
geom_histogram()+
labs(caption = str_glue("The data include {nrow(surv)} cases."))

Look for a moment at only the str_glue() function in the above command:

str_glue("The data include {nrow(surv)} cases.")

Note the following:

The contents of str_glue() are all within double quote marks
The R code nrow(surv) is within curly brackets { } within the quote marks

Advanced `str_glue()` commands

More complex str_glue() commands could contain multiple code parts, like this:

ggplot(
  data = surv,
  mapping = aes(x = date_onset))+
geom_histogram()+
labs(caption = str_glue("The data include {nrow(surv)} cases and are current to {format(max(surv$date_report, na.rm=T), '%d %b %Y')}."))

Note the following:

There are two parts of embedded code, both within curly brackets { }. The second one returns the latest date of report, formatted to print as DAY MONTH YEAR.
Because the format() function itself required the use of quote marks, single quote marks are used in order to not "break" the surrounding double quotes used by str_glue()

An alternative format for str_glue(), which can be more easy to read, is to use placeholders within the { } brackets, and define each of them in separate arguments at the end of the str_glue() function, as below. \n is used to force a new line.

ggplot(
  data = surv,
  mapping = aes(x = date_onset))+
geom_histogram()+
labs(
  caption = str_glue(
    "Data include {num_cases} cases and are current to {current_date}.\n{n_missing_onset} cases are missing date of onset and not shown",
    num_cases = nrow(surv),
    current_date = format(Sys.Date(), "%d %b %Y"),
    n_missing_onset = fmt_count(surv, is.na(date_onset))
    )
  )

Explore the use of format() and strptime % symbols to adjust date display in our Epi R Handbook chapter on Dates.

r fontawesome::fa("terminal", fill = "black") Your turn!

Practice adding a caption or subtitle to a plot in your report, using your choice of the code above. Type the code yourself (do not copy and paste) so that you get a firm understanding of the syntax!

Params

R Markdown parameters ("params") allow you to tailor a generic report, or report section, to a specific jursidiction, time period, or other aspect which you can control with the YAML.

This is done with params in the YAML. For example, we could add a param to our YAML that contains a district name, then use that param to filter the entire report to only analyse data from that one district.

Add further lines to your YAML as shown below. Note that the colons and indenting must be exactly correct.

---
title: "Situation Report"
subtitle: "Ebola outbreak in Sierra Leone"
author: "(Your name or agency here)"
output:
  word_document: default
date: "`r Sys.Date()`"
params:
  district: "West II" 
---

This creates a hidden "param" object called params$district with the value "West II" which you can use in your report. You can set this value at the top of the script - in the YAML - and the effects can cascade throughout the report.

Filter the data by district

In the YAML, you have written the parameter district: "West II". You can access this value in the script via params$district.

Add a section near the bottom of your R Markdown script that holds a "spotlight analysis" on the district which is set in the YAML params.

Make a Heading and an introductory sentence which introduces this section as a spotlight on X district. Use in-line code so that the district name updates dynamically to print params$district.
In a code chunk below this text, begin with the clean surv data frame and pipe it into a filter() function that restricts it to only rows in which the column district is equal to params$district.
Then pipe this filtered data frame into a tabyl() function that shows the breakdown by hospital and by sex.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

In the R Markdown text you would write something like:

Below is a spotlight on hospital admissions for patients reported in
`r knitr::inline_expr("params$district")` district.

And in the code chunk you could have code like this:

# Spotlight on district defined at top of script in YAML  
surv %>% 
  filter(district == params$district) %>%   # filter data frame to the district in YAML
  tabyl(hospital, sex) %>%                  # begin the cross-tabulation    
  adorn_totals("both") %>%                  # add totals for both rows and columns
  qflextable()

Note the double equals to test equivalency, ==

Now you, or a colleague, can update the param in the YAML and have the spotlighted district change automatically.

Knitting with parameters

You can make this process more user-friendly by using the "Kitting with Parameters" option.

First, edit your YAML to be exactly like this. Pay attention to indentations (2 spaces each) and colon placement.

This will allow the user to choose a value for params$district from a pre-defined list.

---
title: "Situation Report"
subtitle: "Ebola outbreak in Sierra Leone"
author: "(Your name or agency here)"
output:
  word_document: default
date: "`r Sys.Date()`"
params:
  district:
    input: select
    value: "West II"
    choices: ["West I", "West II", "West III", "Mountain Rural"] 
---

Now click the "Knit with Parameters" option.

knitr::include_graphics("images/knit_params.png", error = F)

You should see a pop-up window allowing you to select one of the districts from the choices YAML key.

With this interface, one of your colleagues who does not know R could even run the report!

End

Congratulations! You have written an R Markdown situation report with lots of interesting R code!

You can compare your R Markdown to our "backup" one in the "scripts/backup" subfolder.

Continue to the Extras if you have time and interest... In particular, for more advanced params usage!

Extra - Date params

This is a somewhat advanced section. If this confuses you, simply focus on understanding the foundational R Markdown techniques addressed in earlier sections.

A more advanced technique is to have params that handle dates. Add the publish_date param to your YAML and test it.

---
title: "Situation Report"
subtitle: "Ebola outbreak in Sierra Leone"
author: "(Your name or agency here)"
output:
  word_document: default
date: "`r Sys.Date()`"
params:
  district:
    input: select
    value: "West II"
    choices: ["West I", "West II", "West III", "Mountain Rural"] 
  publish_date: !r lubridate::ymd("2014-12-01")
---

Note the use of !r to indicate the use of R code in a param. Also note that any functions used to interpret a param (e.g. ymd() from {lubridate}) must explicitly called with the double colon (::) syntax above. This is because the YAML is run prior to your package loading command. The double colon syntax loads the package for use in that line.

You can now use params$publish_date, of class "Date", in your report.

Change the Summary bullet "About this report" to display params$publish_date instead of Sys.Date().

Again, click "Knit with Parameters", and see how this param can be selected by the user via a calendar interface.

Select an appropriate date cutoff

Let us add a filter on surv, using params$publish_date, such that only cases reported prior to a certain date are included in the report.

Assume the following:

Epidemiological weeks for this outbreak begin on Mondays
There are typically 3 days of delay in the reporting of cases into this linelist

We will apply this threshold: Only include cases from the most recent complete and reliable epidemiological week, accounting for a 3-day delay in reporting.

Using the calendar below:

knitr::include_graphics("images/calendar.png", error = F)

If the report is published the morning of Thursday 27 November, the most recent reliable day of data is Sunday 23 November, and the most recent complete epi week is 17-23 November.
If the report is published the morning of Wednesday 3 December, the most recent reliable day of data is Saturday 29 November, and the most recent complete epi week is still 17-23 November.
If the report is published Thursday 4 December, the most recent complete epi week is 24-30 November.

Filter logic

To apply a date filter to surv, simply add a filter() line to the bottom of your cleaning command.

Start the filter simply, by including only dates without reporting delays. We include dates prior to ( < ) 3 days before the publish date):

(your cleaning pipe chain...) %>% 
filter(date_report < params$publish_date - 3)

If your publish date is 3 December, this should keep only cases reported on Saturday 29 November or earlier. You can check this by running max(surv$date_report, na.rm=TRUE).

Round down to the week

You can use the floor_date() function from the {lubridate} package to "round down" to the start of the week (or month, day, year, etc.).

Set the unit = "week" argument to round down to the week start
Set the week_start = argument to 1 to have the weeks begin on Mondays

(your cleaning pipe chain...) %>% 
filter(date_report < floor_date(params$publish_date - 3,
                                 unit = "week",
                                 week_start = 1))

With a publish date of 3 December, the maximum date_report value should be 23 November. If the publish date is 4 December, the maximum should be 30 November.

Include or exclude missing dates

What did your filter do to rows with missing date of report? They were excluded by default, because they were not specifically addressed in the filter logic statement.

Note that there are no NA rows remaining.

summary(surv$date_report)

To keep these rows, add | is.na(date_report) to your filter command, as below. This translates to "...OR, where date_report is NA"

(your cleaning pipe chain...) %>% 
filter(date_report < floor_date(params$publish_date - 3,
                                unit = "week",
                                week_start = 1) | is.na(date_report))

Observe how re-running your cleaning command now will result in more observations.

summary(surv$date_report)

More Extras

Quarto

The company Posit, which makes RStudio and helped develop R Markdown, is now producing a new script type called Quarto.

Quarto syntax is very similar to R Markdown. We opted to teach you R Markdown because Quarto is still relatively new. However, in the coming years Quarto will be where most new innovation occurs.

While you can write Python, SQL, and Julia code in R Markdown chunks, this will be more streamlined in Quarto scripts.

Read about Quarto and try it if you wish! Just go to "New File" and select "Quarto Document".

Nuances of data import/export in R Markdown

R Markdown is one realm in which using the package {here} and its function here() to import/export data is very useful.

Many legacy R users learned the concept of the "working directory" - the location in your folders that is the starting point for file paths for import/exporting. They also learned the {base} R functions getwd() (to print your current working directory) and setwd() (to define a new working directory). Applied Epi encourages our course participants to be aware of these two functions, but to instead use RStudio projects and the here() function from the {here} package. Used together, all file paths begin from a consistent starting point (the top/"root" of the RStudio project).

With R Markdown scripts use of here() is especially advantageous for two reasons:

The working directory

The first reason is that it simplifies how you specify file paths in your RMD script. Unlike normal .R scripts, the working directory for an R Markdown script is the folder that contains the .Rmd script. This means that if your data are stored in a separate "data/" folder, it can be difficult to write a command that points R towards the dataset.

See for yourself! Run the command getwd() from within an R code chunk - you should that this path continues all the way to the "rmd_course/scripts/" folder (where this .Rmd is located). Compare this to opening a new "normal" .R script and running getwd() - there you should see the root of the RStudio project only.

Try running this command in the chunk for importing data (note the absence of here()):

# this will NOT work to import your data, from an Rmd in the scripts subfolder
surv <- import("data/clean/surveillance_linelist_clean_20141201.rds")

This command is not working because from the starting point of the chunk's working directory (the scripts folder), there is no "data" folder to enter.

A bad way of solving this problem

How would you tell R to navigate backwards out of the "scripts/" folder and import the file at "course_rmd/data/clean/surveillance_linelist_clean_20141201.rds"?

The below command is a "work around" approach that is fragile and will break easily. We are only showing you because you may encounter it from other R coders.

The two periods "../" in the file path tell R to go "up" one folder from the working directory (.Rmd file location), then proceed to the "data/" folder, the "clean/" folder, and to the dataset. Be warned! If the .Rmd script is moved anywhere else, this command may not work!

# fragile import command that goes "up" one folders from the Rmd location
surv <- import("../data/clean/surveillance_linelist_clean_20141201.rds")

A better solution with `here()`

A better command is written below. It uses here() to always start the file path from the RStudio project root folder. This is flexible - this command will successfully navigate to the "data/raw/" folder no matter where the script is located!

# robust, relative import command that always starts from the R project root
surv <- import(here("data", "clean", "surveillance_linelist_clean_20141201.rds"))

You can read more about here() in this chapter of the Epi R Handbook, and in the package documentation.

here() file paths, as written above (without slashes), have the added advantage of adapting automatically to the slash direction of the computer that is running the command.

`setwd()` is not persistent in R Markdown

The second reason it is better to use here() within R Markdown scripts, is that using setwd() to change the working directory in one chunk does not change it for all chunks!

Putting setwd() in one R Markdown chunk will only change the working directory for that chunk!

There are more complex ways to fix this, but using here() is a much easier and robust way to solve this problem.

Chunk names

There is an optional naming feature for code chunks. We encourage R Markdown beginners to not use this feature because it can lead to more errors, if you are not careful.

To give the chunk a name, write the name in the line that starts the chunk (that begins with three backticks). Within the curly brackets { }, write the name after the "r" but before the comma! You can write whatever names you wish, as long as they adhere to certain rules.

Chunk names do NOT appear in your report output. This is a common misunderstanding.

Advantages of using chunk names

They can help you organise your code
You can see the names in the chunk navigation tool, clickable at the bottom of your script
If there is an error message, it will return the name of the chunk. If there are no names, it will return the line number.

Click the chunk navigation tool at the bottom of the script to review all the chunks in your script"

knitr::include_graphics("images/names1.png", error = F)

Disadvantages of using chunk names

They can lead to more errors when knitting, if you are not careful to follow their naming rules.

Rules about chunk naming

Chunk names cannot contain spaces. Knitting will return an error if there are spaces.
There cannot be two chunks with the same name. This is a common mistake, because users often copy-and-paste chunks within the script.

Knitting to PDF

Knitting directly to PDF can be more difficult and you may encounter more challenges with formatting. To try it run the following commands in your Console, to install the "tinytex" software (if you write them in an R code chunk in your script, put a # to deactivate them after running):

# installs tinytex, to C: location
tinytex::install_tinytex() 

# checks for installation; should return TRUE (note three colons)
tinytex:::is_tinytex()

Remember, you can always knit to Word, and then convert to PDF.

Add a manual table

Just for your information, review the content below. You do not need to type this into your report.

You can add "manual" or "static" tables in a report. You can do this easily in Visual mode (but beware switching between Visual and Source mode afterwards), or in Source mode using | to designate columns and - dashes to separate the header row and indicate how wide each column should be.

| Response Agency | Address | Point of Contact | | ---------------------- | ------------------- | --------------| | Department of Health | 107 S. Broad Street | Malaya Gonzales | Emergency Medical Services | 89 Rue de l'Independence | Francois Cartier |Fire Department | 2000 Center Drive | Sgt. Kamala Brown

The above table was created in Source Mode with the following code. Try pasting it into your script and adjusting the number of - dashes. See how the column width changes.

| Response Agency        | Address             | Point of Contact |
| ---------------------- | ------------------- | -----------------|
| Department of Health   | 107 S. Broad Street | Malaya Gonzales |
Emergency Medical Services | 89 Rue de l'Independence | Francois Cartier
|Fire Department | 2000 Center Drive | Sgt. Kamala Brown

Note that this approach to tables is static. These tables do not change automatically and are not linked to any R object or datasets.

Later, we will review options to present tables of data or summary tables using R packages, within R code chunks.

Formatting tips

If you want to insert extra empty lines or space in your document, insert </br> in the text parts of your Rmd. You can write multiple of these lines to get more space.

If you want to designate a page break (does not apply to HTML), insert \pagebreak in a Markdown text area (not in a code chunk).

HTML report customisation

If you render your report in HTML, try the following customisations!

Table of contents (TOC)

You can add a table of contents using the YAML toc option and specify the depth of headers using toc_depth.

If the table of contents depth is not explicitly specified, it defaults to 3 (meaning that all level 1, 2, and 3 headers will be included in the table of contents).

Floating TOC

For long reports, you may want to be able to view the table of contents even as you scroll down. You can specify the toc_float option so the table of contents is always visible even the document is scrolled.

---
title: "Surveillance Report"
author: "Your name or agency here"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: TRUE
    toc_float: TRUE
---

Section numbering

You can add section numbering to headers using number_sections: TRUE in the YAML, beneath the html_report option with the others.

Note that if you do choose to use number_sections, you will likely also want to make the highest-level headings with a single hash # (H1), because ## (H2) headers will include a decimal point if there are no H1 headers (0.1, 0.2, and so on).

Tabs

To create clickable tabs in the document, simply add .tabset in the curly brackets { } that are placed after a heading. Any sub-headings beneath that heading (until another heading of the same level) will appear as tabs that the user can click through.

This feature can make for a very fancy-looking report!

Try it! Add {.tabset} to the # Summary heading near the top of your report.

When you re-knit, the second-level headings (##) for "About this report", "Data import and cleaning", and "Key numbers" should appear as clickable tabs!

Further customisation

Let's practice by customising some aspects of our report:

Add a subtitle by adding a subtitle: field
Under html_document: add a table of content using toc option
Use toc_float option to make sure your table of content is always visible

Your YAML will look something like this:

---
title: "Surveillance Report"
subtitle: "Using Advanced R Markdown Options"
author: "Your name or agency here"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: TRUE
    toc_float: TRUE
---

r fontawesome::fa("eye", fill = "darkblue") Notice the placement of 2-space indentations under output:. Correct spacing is important for YAML syntax otherwise your code will not run.

quiz(caption = "Quiz - YAML",
  question("What does `toc_depth: 2` do? ",
    allow_retry = T,
    answer("Shows level 1 and level 2 headers in table of contents",
           correct = T,
           message = ""),
    answer("Makes table of contents visible throughout a long document", 
           correct = F,
           message = "This is what `toc_float:` option does"),
    answer("Shows only two code chunks",
           correct = F), 
    answer("All of the above",
           correct = F)
  )
)

Code folding

Depending on your audience, you may sometimes want to have an option of showing the R code for your analysis in the report.

If you set echo = FALSE in the "setup" R code chunk, it can be overwhelming showing ALL of the code.

A good middle option is the code_folding feature for HTML outputs.

1) Adjust your YAML so that under html_document: you have code_folding: hide

---
title: "Surveillance Report"
subtitle: "Using Advanced R Markdown Options"
author: "Your name or agency here"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: TRUE
    toc_float: TRUE
    code_folding: hide
---

2) Importantly, also change the "setup" R code chunk so that echo = TRUE.

3) Knit your document.

You should see small "Show" icons on the right side of report sections, which when clicked will reveal the code for that section. For example, look at the "Key numbers" section, near the top.

knitr::include_graphics("images/code_folding.png", error = F)

At the top-right of the report you will also see an button "Code" that offers a menu to "Show all code" or "Hide all code".

You can read more about code folding here

Themes

You can use theme option to change the default theme of your document. Valid themes include default, bootstrap, cerulean, cosmo, darkly, flatly, journal, lumen, paper, readable, sandstone, simplex, spacelab, united, and yeti.

r fontawesome::fa("terminal", fill = "black") Your turn!

Now, go ahead and edit your YAML to change to a default theme of your choice.

r fontawesome::fa("check", fill = "red")Click to see a solution (try it yourself first!)

---
title: "Surveillance Report"
subtitle: "Using Advanced R Markdown Options"
author: "Your name or agency here"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: TRUE
    toc_float: TRUE
    theme: "cerulean"
    code_folding: hide
---

README files

Most folders should have a README file that explains what lives in the folder, how it is updated, how it is used, etc.

In a finished project, the README should help users navigate and understand the contents.
In a project template, the README contains instructions for you to setup the project!

To practice, add a .txt file that is named README.txt to either the root folder of the project, or the scripts folder. Write a brief description of how a colleague can properly run the report using the most up-to-date case data.

You can do this by opening NotePad or a similar software, entering the text, and saving it in the folder. If you cannot find a plain text editor, you can use Microsoft Word.

appliedepi/introexercises documentation built on April 22, 2024, 1:01 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

appliedepi/introexercises

In appliedepi/introexercises:

Introduction to R for Applied Epidemiology and Public Health

Welcome

R Markdown

Format

Getting Help

Quiz questions

License

Learning objectives

Prepare

Create a new R Markdown script

R Markdown overview

Buttons

Viewing modes

Knit

Insert Chunk

Run

Outline

Producing a report

YAML

The YAML header

Output formats

HTML

Word Document

PDF Document

Other formats

YAML syntax

YAML errors

The Render pane

Prepare your YAML

Code chunks

Running a code chunk

Run code line-by-line

Creating a code chunk

Add Ebola code chunks

Running order

Formatting the text

Add sections

Make a "Summary" section of your report

Add text and bullet points

Chunk options

Individual chunks

The gear icon

Chunk names

Build your report

Inline code

Today's date

Number of rows in the linelist

Max and min

Return a specific number of cases and their percent

The pull() function

paste()

Dynamic text within an R command

Advanced str_glue() commands

Params

Filter the data by district

Knitting with parameters

End

Extra - Date params

Select an appropriate date cutoff

Filter logic

Round down to the week

Include or exclude missing dates

More Extras

Quarto

Nuances of data import/export in R Markdown

The working directory

A bad way of solving this problem

A better solution with here()

setwd() is not persistent in R Markdown

Chunk names

Knitting to PDF

Add a manual table

Formatting tips

HTML report customisation

Table of contents (TOC)

Floating TOC

Section numbering

Tabs

The `pull()` function

Advanced `str_glue()` commands

A better solution with `here()`

`setwd()` is not persistent in R Markdown