knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center" ) library(tidyverse)
separate
takes a column containing multiple variables on input and returns multiple columns, each with a new variable. For example, a column with year/month/day information can be separated into invidual columns.
ys <- 1999:2002 ms <- c('Jan', 'Feb', 'Mar') ds <- 1:10 dates <- tidyr::crossing(ys, ms, ds) %>% unite(date, ys:ds, sep = '-')
separate
dates # separate is the inverse of unite dates %>% separate(date, into = c('year', 'month', 'day'), sep = '-')
The sep
argument can take:
rep_value
using sep = '_'
into rep
and value
)a1
using sep = 1
into a
and 1
)Finally the extra
and fill
arguments to separate
control what happens when there are too many and not enough variables.
crossing
crossing
is useful for generating combinations of variables in tibble format. For example, use crossing
to generate combinations of experimental varaibles including sample names, gene names, reaction conditions, and replicates.
genotype <- c('wt', 'mut') gene <- c('IFN', 'ACTIN') time <- c(0, 12, 24, 48) rt <- c('+', '-') # reverse transcriptase added? rep <- 1:3 samples <- tidyr::crossing(genotype, gene, time, rep, rt) samples
Now we'll use tidy data principles to analyze some qPCR data.
Many biological assays make use of the 96 (or 384) well plate. Note the similarity between the plate and a tibble
: there are rows and columns, and each well contains a reaction that will generate one or more data points.
All variables should be systematically listed in your sample names, i.e. name_rep_time_RT
. Systematic naming makes it easy to extract relevant information.
Take this example, where the sample names are a combination of a genotype (WT
and MT
), a time point (0,4,8,24 hour), and a replicate (1,2,3), separated by a hyphen.
library(tidyverse) # for reproducible `sample` set.seed(47681) samples <- tidyr::crossing( name = c('WT', 'MT'), hours = c('t0', 't4', 't8', 't24'), reps = 1:3 ) %>% mutate( value = sample(1:100, n(), replace = TRUE), .id = row_number() ) %>% unite('sample.name', name, hours, reps, sep = '-') %>% select(-.id) samples
Because the samples have systematic names, it is easy to separate this information into individual columns.
sample_info <- samples %>% tidyr::separate( sample.name, into = c('sample', 'hour', 'rep'), sep = "-" ) sample_info
Now we can use dplyr
and tidyr
functions to manipulate the data.
# calculate summary statistics sample_info %>% group_by(sample, hour) %>% summarize(mean(value)) # subtract a background value. N.B.: rearranging the table makes this calculation easy. sample_info %>% spread(hour, value) %>% mutate(t24_norm = t24 - t0)
The class library provides two related tibbles that describe a simulated qPCR experiment called qpcr_names
and qpcr_data
.
library(pbda)
qpcr_names
qpcr_data
We will use tidying concepts to prepare this data for efficient analysis and visualization.
qpcr_data
and qpcr_names
into a structure like:qpcr_names_tidy <- qpcr_names %>% gather(col, value, -row) # `exp` is the relative expression level qpcr_data_tidy <- qpcr_data %>% gather(col, exp, -row) qpcr_data_tidy
qpcr_names_tidy
.qpcr_names_tidy <- separate( qpcr_names_tidy, value, into = c('sample', 'time', 'gene', 'rt', 'rep'), sep = '_' ) qpcr_names_tidy
qpcr_tidy <- left_join(qpcr_names_tidy, qpcr_data_tidy) qpcr_tidy
qpcr_tidy %>% filter(rt == "+") %>% group_by(sample, gene, time) %>% summarize(mean_exp = mean(exp), var_exp = var(exp))
Plot the expression for each gene over time.
Calculate a fold-change for IFN over ACTIN and re-plot.
ggplot(qpcr_tidy, aes(x = time, y = exp, color = rt)) + geom_point(size=3) + facet_wrap(~gene)
Chunk options control the behaviour of code chunks. Some useful settings:
Show but don't run code with eval = FALSE
. Useful if you have chunk that is failing but want to build the rest of the document.
Suppress messages with message = FALSE
and warnings with warnings = FALSE
Code folding lets you embed collapsible code chunks in your rendered HTML document. Please use this for your assignments.
--- title: "Hide my code!" output: html_document: code_folding: hide ---
Organize your code blocks for easy reading.. 80 characters per line, and break pipes into pieces.
# nope mtcars %>% ggplot(aes(x = mpg, y = hp)) + geom_point('red') + geom_line() # yep mtcars %>% ggplot(aes(x = mpg, y = hp)) + geom_point(color = 'red') + geom_line()
# Hypothesis # Approach ## Statistics ## Plots # Interpretations ## Conclusions
fig.cap=
chunk option.library(tidyverse) iris %>% ggplot(aes(Sepal.Length, Sepal.Width)) + geom_point() + facet_grid(~Species)
ggplot(mtcars, aes(x = mpg, y = hp, color = factor(am))) + geom_point() + ggtitle("Motor Trend Cars (colored by transmision type)")
p <- ggplot(iris, aes(x = Petal.Width, y = Petal.Length)) + geom_point(size = 5, aes(color = factor(Species))) p + scale_color_brewer(palette = 'Set1')
library(viridis) p + scale_color_viridis(discrete = TRUE)
Separating your data by groups is a powerful way to visualize differences between them.
mtcars %>% ggplot(aes(x = mpg, y = hp)) + facet_grid(gear ~ am) + geom_point()
The cowplot
package provides a variety of sane defaults for your plots. These are especially useful when making publication-quality figures.
# install.packages('cowplot') library(cowplot) ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point() + facet_grid(~cyl) # note the formatting of the `geom_point` section ggplot(mtcars, aes(x = mpg, y = hp)) + facet_grid(~cyl) + geom_point( data = mtcars %>% select(-cyl), color = "grey", alpha = 0.3 ) + geom_point(color = 'red', size = 2)
The DT
library provides dynamic table with search capabilities.
# Search for "Mazda" library(DT) DT::datatable(mtcars)
knitr::kable()
will display a static table.
knitr::kable(mtcars)
Stack Overflow
Follow #rstats folks on twitter (@drob, @hadleywickham, @JennyBryan)
Create an RMarkdown document for Problem Set 1. Submit your final document as "Quiz 1" on Canvas by Friday at 5 PM.
Your submitted document must knit to HTML without errors. I.e., when you click the "Knit" button, the document should build and display and HTML page.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.