knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center"
)

library(tidyverse)

Dealing with sample names

separate takes a column containing multiple variables on input and returns multiple columns, each with a new variable. For example, a column with year/month/day information can be separated into invidual columns.

ys <- 1999:2002
ms <- c('Jan', 'Feb', 'Mar')
ds <- 1:10

dates <- tidyr::crossing(ys, ms, ds) %>% unite(date, ys:ds, sep = '-')

separate

dates

# separate is the inverse of unite
dates %>% separate(date, into = c('year', 'month', 'day'), sep = '-')

The sep argument can take:

Finally the extra and fill arguments to separate control what happens when there are too many and not enough variables.

crossing

crossing is useful for generating combinations of variables in tibble format. For example, use crossing to generate combinations of experimental varaibles including sample names, gene names, reaction conditions, and replicates.

genotype <- c('wt', 'mut')
gene <- c('IFN', 'ACTIN')
time <- c(0, 12, 24, 48)
rt <- c('+', '-') # reverse transcriptase added?
rep <- 1:3

samples <- tidyr::crossing(genotype, gene, time, rep, rt)

samples

Data in the 96-well plate format.

Now we'll use tidy data principles to analyze some qPCR data.

Many biological assays make use of the 96 (or 384) well plate. Note the similarity between the plate and a tibble: there are rows and columns, and each well contains a reaction that will generate one or more data points.

plate

Sample names

All variables should be systematically listed in your sample names, i.e. name_rep_time_RT. Systematic naming makes it easy to extract relevant information.

Take this example, where the sample names are a combination of a genotype (WT and MT), a time point (0,4,8,24 hour), and a replicate (1,2,3), separated by a hyphen.

library(tidyverse)

# for reproducible `sample`
set.seed(47681)

samples <-
  tidyr::crossing(
    name = c('WT', 'MT'),
    hours = c('t0', 't4', 't8', 't24'),
    reps = 1:3
    ) %>%
  mutate(
    value = sample(1:100, n(), replace = TRUE),
    .id = row_number()
    ) %>%
  unite('sample.name', name, hours, reps, sep = '-') %>%
  select(-.id)

samples

Extracting sample names

Because the samples have systematic names, it is easy to separate this information into individual columns.

sample_info <- samples %>%
  tidyr::separate(
    sample.name,
    into = c('sample', 'hour', 'rep'),
    sep = "-"
  )

sample_info

Data manipulation

Now we can use dplyr and tidyr functions to manipulate the data.

# calculate summary statistics
sample_info %>% group_by(sample, hour) %>% summarize(mean(value))

# subtract a background value. N.B.: rearranging the table makes this calculation easy.
sample_info %>% spread(hour, value) %>% mutate(t24_norm = t24 - t0)

qPCR data

The class library provides two related tibbles that describe a simulated qPCR experiment called qpcr_names and qpcr_data.

library(pbda)

qpcr_names

qpcr_data

We will use tidying concepts to prepare this data for efficient analysis and visualization.

qPCR data tidying

qpcr_names_tidy <- qpcr_names %>% gather(col, value, -row)
# `exp` is the relative expression level
qpcr_data_tidy <- qpcr_data %>% gather(col, exp, -row)

qpcr_data_tidy

Sample names

qpcr_names_tidy <- separate(
  qpcr_names_tidy,
  value,
  into = c('sample', 'time', 'gene', 'rt', 'rep'),
  sep = '_'
)

qpcr_names_tidy

Data joining

qpcr_tidy <- left_join(qpcr_names_tidy, qpcr_data_tidy)

qpcr_tidy

Statistical summary

qpcr_tidy %>%
  filter(rt == "+") %>%
  group_by(sample, gene, time) %>%
  summarize(mean_exp = mean(exp), var_exp = var(exp))

Plots

ggplot(qpcr_tidy, aes(x = time, y = exp, color = rt)) + geom_point(size=3) + facet_wrap(~gene)

Rmarkdown

Code Chunks

Chunk options control the behaviour of code chunks. Some useful settings:

Code folding lets you embed collapsible code chunks in your rendered HTML document. Please use this for your assignments.

---
title: "Hide my code!"
output:
  html_document:
    code_folding: hide
---

Organize your code blocks for easy reading.. 80 characters per line, and break pipes into pieces.

# nope
mtcars %>% ggplot(aes(x = mpg, y = hp)) + geom_point('red') + geom_line()

# yep
mtcars %>%
  ggplot(aes(x = mpg, y = hp)) +
  geom_point(color = 'red') +
  geom_line()

Writing notes on your analysis

# Hypothesis

# Approach

## Statistics

## Plots

# Interpretations

## Conclusions

Making quality plots

Figure legends

library(tidyverse)
iris %>% ggplot(aes(Sepal.Length, Sepal.Width)) + geom_point() + facet_grid(~Species)

Figure titles

ggplot(mtcars, aes(x = mpg, y = hp, color = factor(am))) +
  geom_point() +
  ggtitle("Motor Trend Cars (colored by transmision type)")

Color

Palettes

p <- ggplot(iris, aes(x = Petal.Width, y = Petal.Length)) +
  geom_point(size = 5, aes(color = factor(Species)))

p + scale_color_brewer(palette = 'Set1')
library(viridis)
p + scale_color_viridis(discrete = TRUE)

Scales (Discrete and continuous)

Facets

Separating your data by groups is a powerful way to visualize differences between them.

mtcars %>% ggplot(aes(x = mpg, y = hp)) + facet_grid(gear ~ am) + geom_point()

Themes

The cowplot package provides a variety of sane defaults for your plots. These are especially useful when making publication-quality figures.

# install.packages('cowplot')
library(cowplot)
ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point() + facet_grid(~cyl)

# note the formatting of the `geom_point` section
ggplot(mtcars, aes(x = mpg, y = hp)) +
  facet_grid(~cyl) +
  geom_point(
    data = mtcars %>% select(-cyl),
    color = "grey",
    alpha = 0.3
  ) +
  geom_point(color = 'red', size = 2)

Tables

The DT library provides dynamic table with search capabilities.

# Search for "Mazda"
library(DT)
DT::datatable(mtcars)

knitr::kable() will display a static table.

knitr::kable(mtcars)

Resources

Getting help

Quiz

Create an RMarkdown document for Problem Set 1. Submit your final document as "Quiz 1" on Canvas by Friday at 5 PM.

Your submitted document must knit to HTML without errors. I.e., when you click the "Knit" button, the document should build and display and HTML page.



rnabioco/eda documentation built on July 12, 2022, 2:17 p.m.