R Markdown

library(ggplot2)

Introduction

What is R Markdown

R Markdown provides an unified authoring framework for data science, combining your code, its results, and your prose commentary. R Markdown documents are fully reproducible and support dozens of static and dynamic output formats like PDFs, Word files, slideshows, and more.. For a comprehensive resource on R Markdown see R Markdown: The Definitive Guide

R Markdown integrates a number of R packages and external tools. This means that help is, by-and-large, not available through ?. Instead, as you work through this chapter, and use R Markdown in the future, keep these resources close to hand:

Both cheatsheets are also available at http://rstudio.com/cheatsheets.

The real point of R Markdown is that it lets you include your code, have the code run automatically when your document is rendered, and seamlessly include the results of that code in your document.

Code languages

While there is an "R" in R Markdown this is not limited to just the R programming language. It is a generic markup language that can knit any code that can be run from a command line into a document. Some of the more popular languages are:

You can intermix languages within the same document, and in some cases pass data between languages.

```{block2, type='rmdimportant'} You do not have to include any programming language. There are many books, websites, and wikis written in R Markown which contain no code.

## Markdown Basics

Format the text in your R Markdown file with [Pandoc’s Markdown](https://pandoc.org/MANUAL.html#pandocs-markdown), a set of markup annotations for plain text files. When you render your file, Pandoc transforms the marked up text into formatted text in your final file format.

Notice that the file contains three types of content:

1. Text mixed with simple text formatting.
1. Code chunks and the corresponding output.
1. An (optional) YAML header controlling the "whole document" settings.

### Headers

The character `#` at the beginning of a line means that the rest of the line is interpreted as a section header. The number of `#`s at the beginning of the line indicates whether it is treated as a section, sub-section, sub-sub-section, etc. of the document.

Heading Level 1

Heading Level 2

Heading Level 3

In this document the chapter title **R Markdown** is preceded by a single `#`, but **Markdown Basics** at the start of this paragraph was preceded by `##` and the current section **Headers** is preceded by `###` in the text file.

### Paragraph Breaks and Forced Line Breaks

To insert a break between paragraphs, include a single completely blank line.

To force a line break, put two blank
spaces at the end of a line.

To insert a break between paragraphs, include a single completely blank line.

To force a line break, put two blank  
spaces at the end of a line.

If you don't put the two blank spaces at the end of a line they will run together.

If you don't put the two blank
spaces at the end of a line
they will run together.

### Italics and Boldface

Text to be italicized goes inside a single set of underscores or asterisks. Text to be boldfaced goes inside a double set of underscores or asterisks.

Text to be _italicized_ goes inside _a single set of underscores_ or *asterisks*.  Text to be **boldfaced** goes inside a __double set of underscores__  or **asterisks**.

### Bullet Points
* This is a list marked where items are marked with bullet points.
* Each item in the list should start with a `*` (asterisk) character, or a single dash (`-`) and then have a space.
* Each item should also be on a new line.
    + Indent lines with 4 spaces and begin them with `+` for sub-bullets.
    + Sub-sub-bullet are not’t really a thing in R Markdown.

### Numbered Lists
  1. Lines which begin with a numeral (0–9), followed by a period, will usually be interpreted as items in a numbered list.
  2. R Markdown handles the numbering in what it renders automatically, so the actual number doesn't matter.
  3. This can be handy when you lose count or don’t update the numbers yourself when editing. (Look carefully at the .Rmd file for this item.) a. Sub-lists of numbered lists, with letters for sub-items, are a thing. b. They are however a fragile thing, which you’d better not push too hard.
1. Lines which begin with a numeral (0–9), followed by a period, will usually be interpreted as items in a numbered list.
1. R Markdown handles the numbering in what it renders automatically, so the actual number doesn't matter.
1. This can be handy when you lose count or don’t update the numbers yourself when editing. (Look carefully at the .Rmd file for this item.)
    a. Sub-lists of numbered lists, with letters for sub-items, are a thing.
    b. They are however a fragile thing, which you’d better not push too hard.

## Math

R Markdown uses standard LaTeX to render complex mathematical formulas and derivations, and have them displayed very nicely. Like code, the math can either be inline or set off (displays).  

Inline math is marked off with a pair of dollar signs (`$`), `$\frac{a+b}{b} = 1 + \frac{a}{b}$` $\frac{a+b}{b} = 1 + \frac{a}{b}$

Mathematical displays are marked off with `\[` and` \]`, as in 

[ \frac{a+b}{b} = 1 + \frac{a}{b} ]

\[ \frac{a+b}{b} = 1 + \frac{a}{b} \]

## Code Chunks

Think of a chunk like a function. A chunk should be relatively self-contained, and focused around a single task.  You can quickly insert chunks like these into your file with

* the keyboard shortcut **Ctrl + Alt + I** (OS X: **Cmd + Option + I**)
* typing the chunk delimiters ` ```r ` and ` ``` `.
* the *Add Chunk* command in the editor toolbar

When you render your .Rmd file, R Markdown will run each code chunk and embed the results beneath the code chunk in your final report.

There are three main sections to a the code chunk header:

1. the programming language engine to run the code
2. the code chunk name (optional but very useful)
3. the code chunk options.

### Chunk Names

Chunks can be given an optional name: ```` ```r ````. This has three advantages:

1.  You can more easily navigate to specific chunks using the drop-down
    code navigator in the bottom-left of the script editor:
1.  Graphics produced by the chunks will have useful names that make
    them easier to use elsewhere.
1.  You can set up networks of cached chunks to avoid re-performing expensive
    computations on every run. More on that below.

It’s a good idea to name code chunks that produce figures, even if you don’t routinely label other chunks. The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes it much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email)

```{block2, type='rmdimportant'}
There is one chunk name wit special behaviour: `setup`. In notebook mode, the chunk named `setup` will be run automatically once, before any other code is run.

Chunk Options

Chunk output can be customized with options, arguments supplied to chunk header. Knitr provides almost 60 options that you can use to customize your code chunks. Here we'll cover the most important chunk options that you'll use frequently. You can see the full list at http://yihui.name/knitr/options/.

The most important set of options controls if your code block is executed and what results are inserted in the finished report:

To set global options that apply to every chunk in your file, call knitr::opts_chunk$set in a code chunk. Knitr will treat each option that you pass to knitr::opts_chunk$set as a global default that can be overwritten in individual chunk headers If you are doing a report where you would never show code include knitr::opts_chunk$set(echo = FALSE) in the setup code block.

\```r 
knitr::opts_chunk$set(echo = FALSE)
\```

Inline Code

Code output can also be seamlessly incorporated into the text, using inline code. This is code not set off on a line by itself, but beginning with `r and ending with `. Using inline code is how this document knows that the mtcars data set contains r nrow(mtcars) rows, and that the mean mpg of the 6 cylinder cars is r round(mean(mtcars[mtcars$cyl == 6, "mpg"]), 1).

R Markdown will always

As a result, inline output is indistinguishable from the surrounding text. Inline expressions do not take knitr options.

Plots

Inserting a graph into RMarkdown is easy, the more energy-demanding aspect might be adjusting the formatting.

\```r
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() +
  coord_flip()
\```
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() +
  coord_flip()

By default, RMarkdown will place graphs by maximizing their height, while keeping them within the margins of the page and maintaining aspect ratio. If you have a particularly tall figure, this can mean a really huge graph. To manually set the figure dimensions in the code chunk options use fig.height and fig.width.

You can also set the alignment with fig.align and either left, right or center. Also, by default the file type of the plot is determined by the output file type. You can override these defaults by using the dev option in the code chunk options. See R Markdown: The Definitive Guide for more details.

Tables

By default, R Markdown displays data frames and matrices as they would be in the R terminal (in a monospaced font). This generally is not what you want. If you prefer that data be displayed with additional formatting there are a number of packages that can make publication ready tables.

Which package you should use depends on what your output format is and which features you want. All these packages have vignettes which show how to use the major features.

library(kableExtra)
dt <- mtcars[1:5, 1:6]
kable(dt, caption = "kable table", booktabs = TRUE)
kable(mtcars, longtable = TRUE, booktabs = TRUE, 
      caption = "Kable Longtable") %>%
  add_header_above(c(" ","Group 1" = 5, "Group 2" = 6)) %>%
  kable_styling(latex_options = c("repeat_header"))

YAML

At the top of any RMarkdown script is a YAML header section enclosed by ---. By default this includes a title, author, date and the file type you want to output to. Many other options are available for different functions and formatting, see here for .html options and here for .pdf options. Rules in the header section will alter the whole document. Have a flick through quickly to familiarize yourself with the sorts of things you can alter by adding an option to the YAML header.

Table of Contents

---
title: "My Report"
output:
  pdf_document:
    toc: true
    toc_depth: 2
    number_sections: TRUE
---

Parameterized reports

One of the many benefits of working with R Markdown is that you can reproduce analysis at the click of a button. This makes it very easy to update any work and alter any input parameters within the report. Parameterized reports extend this one step further, and allow users to specify one or more parameters to customize the analysis. This is useful if you want to create a report template that can be reused across multiple similar scenarios.

Declaring parameters

Parameters are specified using the params field within the YAML section. We can specify one or more parameters with each item on a new line. As an example:

---
title: "My Report"
output: pdf_document
params:
  year: 2018
  region: Europe
  data: file.csv
---

All standard R types that can be included as parameters, including character, numeric, integer, and logical types. We can also use R objects by including !r before R expressions. For example, we could include the current date with the following R code:

---
title: "My Report"
output: html_document
params:
  date: !r Sys.Date()
---

Using parameters

You can access the parameters within the knitting environment and the R console in RStudio. The values are contained within a read-only list called params. In the previous example, the parameters can be accessed as follows:

params$year
params$region

mydata <- read_csv(params$data)

Knitting with parameters

render_report = function(file, region, year) {
  rmarkdown::render(
    "MyDocument.Rmd", params = list(
      region = region,
      year   = year,
      data   = file
    ),
    output_file = paste0("Report-", region, "-", year, ".pdf")
  )
}

Output Types

There are many types of output with more being added every day. lists just the ones



DavisBrian/rclassnotes documentation built on May 17, 2019, 8:19 a.m.