les <- 7 knitr::opts_chunk$set(echo = TRUE, class.source="Rchunk", class.output="Rout") library(ggplot2)
By now you are in your 3rd course using R, and you will have learned that R is an incredibly useful tool for data analysis and visualisation. You have used ggplot to clean your data, make graphs, find desciptive statistics and are able to do the most common statistical tests in R. In DAUR1, you learned how to generate a html-file using Rmarkdown. But those were just the basics.
When working in a lab or other bioinformatics / life science positions, an important part of the work will be communicating about your findings en progress. You will have to write reports, give presentations, maybe write manuals or give weekly updates.
Using Rmarkdown gives you the opportunity to A) automate part of these workflows (no copy-pasting graphs from excel to word, or typing p-values by hand) B) easily change your reports when needed (your supervisor read your report and wants some change in outlier selection, which will result in different averages, different graphs, different p-values? Sure, just update a few lines of code and knit your report again!) C) and do all this in an reproducible, open science way (you keep a direct connection between the data, analysis and communication, so everyone -including you in a few months time when you have forgotten- can easily see what you did)
Additionally, Rmarkdown is just plain text. A big advantage of plain text, is that it is really easy to work with. Github can use it for instance, and plain text is really easy to write compared to for instance html.
If you are starting to get to know the data science realm, you may have encountered people using Jupyter notebooks or Jupyter lab. These are web aplications that, like Rmarkdown combine data, analyses and text. They work with R as well, so that is neat for us! However, they are slightly more difficult to use. While Rmarkdown is in the end plain text, Jupyther notebooks are JSON documents. This is a useful format, but harder to combine with a workflow using git/github.
While we're on the subject, why aren't we teaching you Python? Not because we think there is some epic battle and you will either join the Python or the R side. We are not team R. In fact, most DSFB-teachers can write both. However, Python was developed by software engineers while R was developed by statisticians. So the workflows in using these tools for data science differ. You have some background in statistics but -usually- none in software development, so R will for most of you feel more natural. R is also specifically tailored for data science. As a consequence, R is used more often in the life science work field, academia in general, and for instance pharmaceuticals. Also, Bioconductor is used a lot when analysing high-throughput genomic data, thereby making R the dominant language in bioinformatics.
The top 3 most asked programming languages in job ads (vacatures) in English speaking countries are Python, R and SQL. We will cover SQL in this course (SQL is a database language designed for using data in relational databases). You have learned some R. If you need to switch from R to Python in the future, this is easily done if you have enough experience in R. The languages are in fact rather similar. If you are comfortable using basic building blocks such as functions, for/while/if-loops and vectors/dataframes/lists/etc, learning to do this in a different language is quite easy. While the words differ a bit these building blocks still work in the same way.
Now that we had a look at the alternatives, let's go back to RMarkdown. You may have used a workflow like this in the capstone assignments in DAUR1 and 2:
In the previous lessons, you learned how to collaborate in github. So you can use it to both work on the same Rmarkdown file. The loop becomes:
This method fits in a 'continuous integration' workflow. You can continue working on the same project with multiple people, and frequently integrate your work to see if there are any errors or you are accidentally working on the same part of the project as a coworker.
For example, when making this reader, your teachers work within their own branches. Usually, we make a branch per task we take on for the day. We make sure we branch of from the main, and at least once a day we merge our work back into the main branch (Continuous integration). Usually this is at the end of the day. Two people may have worked on the same .Rmd file, but this is fine. Git merges them for us. We then make sure the main branch still knits to the html in this reader without crashing or otherwise messing up (Continuous testing). If we think the rest of the team would like to see an update, we publish the newest version of the reader to RSconnect, which publishes this reader as a website (Continuous deployment). You will have noticed that we update this reader frequently to change any issues you or we have found.
In this lesson we will focus a bit more on RMarkdown, as it offers a reproducible pathway from data to analysis to communication. Keep the following things in mind when starting an RMarkdown file:
knitr::purl("name_of_your_file.Rmd", documentation = 2)
to turn your .Rmd into a script.knitr::spin("name_of_your_file.R", knit = FALSE, format = "Rmd")
to turn a script into an .Rmd. (Try these commands now)Roughly, your average deliverable will consist of text, figures and tables. Text is pretty easy in RMarkdown. Just type. We believe in you.
But your portfolio and projecticum-products will also contain pictures. There are basically three ways to include an image from an existing file:

tagYou can use the function from {knitr}
include_graphics()
And give the chunk some settings for the width of the figure and the caption:
```r`r ''` knitr::include_graphics( here::here( "images", "prime.jpeg" ) ) ```
knitr::include_graphics( here::here( "images", "prime.jpeg" ) )
The advantage of this approach is that you can use all the chunk options for this code chunk, controlling e.g. the size and behavior of the image as you would normally do for a code-generated graphs. Setting the dpi
option in the code chunk controls the size of the image displayed in the rendered output. To see more options run ?knitr::include_graphics()
in the R console.
If you would like to use the above chunk option -- We usually personally prefer this option because it is an R only solution -- you should know about the {captioner}
package. This package helps you with numbering tables and figures and with it you can create standardized captions, or add captions to your figures from another source file. A good workflow would be to write all captions in a separate text file and add them to your R Markdown document using the {captioner}
package. In this way you can more easily edit and revise captions separate from the main text. For more info on {captioner}
We can also use Markdown syntax like this. See this blog for more details
{ width=20%}
Note that the path name is not a string
{ width=20%}
Setting { width=x%}
is a special control for Pandoc, which is the engine that converts RMarkdown to plain Markdown. This does not work for all Markdown dialects.
The advantage of this approach is that it is written very fast. You can also include an image from a web url like this

You can also include a local (or web url linked) image using html.
<div style="display: table;"> <span> <img src="https://static.wikia.nocookie.net/transformers/images/b/bf/Perceptorg1.jpg" style=" vertical-align: middle; display: table-cell;" /> </span> <span style=" vertical-align: middle; display: table-cell; padding: 20px;"> The advantage of this latter option is obviously that you can tweak the appearance and the behavior of your image far better using embedded html. The disadvantage might be that you would have to learn html. This however is a very good idea if you intend to become a Data Scientist/Bioinformatician. </span> </div>
(Yes, you should use CSS if you are getting this deep into setting html div styles. We'll cover that later.)
Adding basic graphs to an RMarkdown is something you already learned in DSFB1.
To display your graphs, you again give the chunck in which you generate it some settings. Let's use a really visually pleasing graph for the occasion (source), using 3 additional packages!
```r`r ''` `#install.packages("ggridges")` `#install.packages("viridis")` `#install.packages("ggthemes")` library(viridis) library(ggplot2) library(ggridges) library(ggthemes) ggplot(data = diamonds, aes(x = price, y = cut, color = cut, fill = cut)) + geom_density_ridges(alpha = 0.8, scale = 5) + scale_fill_viridis(option = "A", discrete = TRUE) + scale_color_viridis(option = "A", discrete = TRUE) + theme_few() ```
#install.packages("ggridges") #install.packages("viridis") #install.packages("ggthemes") library(viridis) library(ggplot2) library(ggridges) library(ggthemes) ggplot(data = diamonds, aes(x = price, y = cut, color = cut, fill = cut)) + geom_density_ridges(alpha = 0.8, scale = 5) + scale_fill_viridis(option = "A", discrete = TRUE) + scale_color_viridis(option = "A", discrete = TRUE) + theme_few()
(Note that previous figures without captions were not included in the numbering)
The default colours in ggplot are not equally visible for everyone. You may want to use colour blind friendlier palletes for your Rmarkdown documents, because you almost always are making them for a larger public than just yourself. (Also, you might be colourblind yourself.) We can set colours manually:
cbp1 <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") library(palmerpenguins) mass_hist <- ggplot(data = penguins, aes(x = body_mass_g)) + geom_histogram(aes(fill = species), alpha = 0.8, position = "identity") + scale_fill_manual(values = c("darkorange","purple","cyan4")) + theme_minimal() + labs(x = "Body mass (g)", y = "Frequency", title = "Penguin body mass") mass_hist + scale_fill_manual(values = cbp1)
But the Rcolorbrewer package has some options too:
library(RColorBrewer) display.brewer.all(colorblindFriendly = TRUE)
# Box plot bp <- ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot(aes(fill = Species)) + theme_minimal() + theme(legend.position = "top") bp + scale_fill_brewer(palette = "Dark2")
sp <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point(aes(color = Species)) + theme_minimal()+ theme(legend.position = "top") # Scatter plot sp + scale_color_brewer(palette = "Paired")
And you could check out the “viridis” and “magma” scales in the viridis
package here.
If you are seriously interested in making figures for colourblind people, check this website, especially figure 16.
In DAUR1 you learned how to knit tibbles into tables:
data("mpg") knitr::kable(head(mpg))
But you can make them interactive if you like:
data("mpg") library(DT) DT::datatable(mpg)
Of course we can generate and format more complex tables as well. The easiest way is to use the kableExtra
package (Though check the expss
package if some journal or person is insisting you should make APA-style tables).
#install.packages("kableExtra") library(kableExtra) # take just a few columns and rows for demonstration purposes data_for_table <- mtcars[1:5, 1:6]
If you do not define anything, kable_styling() will render you the same table as the default knitr::kable
data_for_table %>% kbl() %>% kable_styling()
Within kable_styling() you can change the settings, such as colours depending on the values displayed, or footnotes and extra headers:
kbl(data_for_table) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left") data_for_table %>% kbl() %>% kable_classic_2(full_width = F) %>% row_spec(0, angle = -45) %>% column_spec(2, color = spec_color(mtcars$mpg[1:5]), link = "https://www.wikipedia.com") %>% column_spec(5, color = "white", background = spec_color(mtcars$hp[1:5], end = 0.7), popover = paste("am:", mtcars$am[1:5])) kbl(data_for_table) %>% kable_classic() %>% add_header_above(c(" " = 1, "Group 1" = 2, "Group 2" = 2, "Group 3" = 2))%>% footnote(general = "Here is a general footnote for the table.")
If you want something specific, check out the vignette for the kableExtra
package here
You can also write tables within RMarkdown:
| Syntax | Description | | ----------- | ----------- | | Header | Title | | Paragraph | Text |
renders:
| Syntax | Description | | ----------- | ----------- | | Header | Title | | Paragraph | Text |
Which is a lot of work, so you would usually use this table generator
You can write formulas in RMarkdown using LaTeX
The formula for a straigth line with slope a
and intercept b
$Y = aX + b$
will give you:
$Y = aX + b$
You can use more latex syntax, but not all of it will work when rendering html files. For instance:
\begin{equation} \label{eq-abc} \binom{n}{k} = \frac{n!}{k!(n-k)!} \end{equation}
Renders the equation, but not the label:
\begin{equation} \label{eq-abc} \binom{n}{k} = \frac{n!}{k!(n-k)!} \end{equation}
When rendering PDF, you can also use it to change appearance like font size:
\footnotesize tiny text! not tiny in html. \normalsize
For HTML, you will want to use CSS formatting, we will discuss this later in this lesson. To build more complex formula's you can use an online LaTeX equation builder, for instance this one.
You will really want to start focusing on using RMarkdown for professional deliverables. To do so, sometimes you want to edit the default way RMarkdown renders your output.
We already showed you can include images with html coding. You could use html syntax for everything in your RMarkdown, but then why are you using RMarkdown in the first place. However, there are a few html things that may come in handy. Do remember that these mainly work when rendering HTML files, and will break if you render pdf's. This may or may not be a major disadvantage depending on the project.
To get a line break in RMarkdown, you can finish a line with 2 spaces, and then hit return. If you want more than one blank line, you can use html commands:
{ width=40%}
Will yield:
some text bla bla
some more text after 3 blank lines
If you want to highlight something (for instance because you don't want to forget adding something later), you can include tags, eg:
<mark> XXX more html here </mark>
<div>
tags work too, these define a division or a section in an HTML document. We will use them later.
Use three or more -
for a horizontal rule. For example,
text #mind the following blank line ---
yields:
You can include links like this:
[click here for the link](https://en.wikipedia.org/wiki/Open_science)
You learned some chunck settings in DAUR1:
Table with different chunk output options. yes means present in the output file.
Option | Run code | Show code | Output | Plots | Message | Warning :----|:-----|:---|:---|:---|:---|:--- eval = FALSE | no | yes | no | no | no | no include = FALSE | yes | no | no | no | no | no echo = FALSE | yes | no | yes | yes | yes | yes results = "hide" | yes | yes | no | yes | yes | yes fig.show = "hide" | yes | yes | yes | no | yes | yes message = FALSE | yes | yes | yes | yes | no | yes warning = FALSE | yes | yes | yes | yes | yes | no
For instance, when writing a report, it's common to not want the R code to actually show up in the final document.
Use the echo
chunk option to do this:
```r`r ''` 1 + 1 ```
or if you want the code to run but not show anything, use include
```r`r ''` 1 + 1 ```
Sometimes you may just want to show some R code with nice syntax highlighting but not evaluate it:
```r`r ''` 1 + 1 ```
Don't forget you can use chunk names, to be able to easily find them:
{width=40%}
You have used these options before. But there are more options you can give to chunks:
If you know a chunk will not need to change as other parts of the document are knitted, you can cache a chunk that contains a potentially long-running or slow command or commands. Be aware that this means this chunk will not update if you change something to your data wrangling part.
```r`r ''` library(ggplot2) ggplot(mpg, aes(displ, hwy, color = class)) + geom_point() # Some really slow plot ```
library(ggplot2) ggplot(mpg, aes(displ, hwy, color = class)) + geom_point() # Some really slow plot
Customize output sizing with chunk options: fig.width
, fig.height
, dpi
, out.width
for example:
```r`r ''` ggplot(mpg, aes(displ, hwy, color = class)) + geom_point() ```
ggplot(mpg, aes(displ, hwy, color = class)) + geom_point()
If you are familiar with writing html, you will know that most styling is done with CSS. Cascading Style Sheets (CSS) is a simple mechanism for adding style (e.g., fonts, colors, spacing) to Web documents.
By default, the HTML output of R Markdown includes the some predefined CSS classes for backgrounds: "bg-primary", "bg-success", "bg-info", "bg-warning", and "bg-danger".
Here we use bg-danger for the code, and bg-warning for the output. Try them out to see what they look like
```r`r ''` mtcars[1:5, "mpg"] ```
But you can make your own styles as well:
```{css, echo=FALSE}`r ''` .watch-out { background-color: lightpink; border: 3px solid red; font-weight: bold; } ```
Then we assign a class watch-out
to the code chunk via the
chunk option class.source
.
```r`r ''` mtcars[1:5, "mpg"] ```
But defining all your styling stuff within your .Rmd gets messy quite quickly. So you can put them in a separate .css file. Put this file in the same folder as your .Rmd file. To link one or multiple custom stylesheets to an Rmd document, you use the css option in your YAML header:
output: html_document: css: "my-style-sheet.css"
Within your .css file, you then define all the styling you want.
For instance, define a class for text with borders:
In the .css file
.bordershere { border-style: solid; border-width: 5px; border-color: purple; }
You are not restricted to code chuncks when using css. You can use html <div>
tags to apply a css style to part of your .Rmd, which may contain anything: text, images, code, etc:
In the .Rmd file
<div class="bordershere"> text </div>
Here is an example to get you started with an css file: click
And if you really like formatting, check w3schools for loads of syntax:click
Note: if you use your own .css file, the predefined CSS classes (bg-danger etc) won't work anymore. But they are not the prettiest anyway.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.