knitr::opts_chunk$set(echo = TRUE)

Analyze a qPCR experiment

Here is a link to the Rmarkdown for this assignment. Download that and update your name at the top.

You are investigating how the transcription factor MAGIC regulates the expression of two genes WAND and POOF. You use wild-type and mutant lines with different forms of the MAGIC gene and examine the expression of MAGIC, WAND, and POOF. The experiment is a time-course after you add an activator of MAGIC. You collect 4 time points (0, 12, 24, 48 hour) with three biological replicates.

The cells have three mutant forms of MAGIC:

You collect expression data in a qPCR experiment in a 384-well plate (a common format for these experiments). The data in CSV format are available in the class package. One CSV contains expression levels and the other contains sample names coded as cell_time_gene_rt_rep.

This code will give you the path and file names of the CSVs.

# devtools::install_github('rnabioco/eda')
qpcr384_data_csv <- system.file("extdata", "qpcr-data-384.csv", package = 'pbda')
qpcr384_names_csv <- system.file("extdata", "qpcr-names-384.csv", package = 'pbda')

Exercise 1

Use a function from the readr library--part of the tidyverse--to load the CSV data. You should have two new tibbles.


Exercise 2

Inspect these tibbles and make note of their resemblance to a 384 plate. Tidy these tibbles into this format:

# A tibble: 128 x 6
cell  time  gene    rt   exp_mean      exp_var
<chr> <chr> <chr> <chr>      <dbl>        <dbl>
1 MAGIC-hypo     0 ACTIN     -   1.000000 0.000000e+00
2 MAGIC-hypo     0 ACTIN     +   2.333333 4.133333e-01
3 MAGIC-hypo     0 MAGIC     -   1.000000 0.000000e+00
4 MAGIC-hypo     0 MAGIC     +  13.000000 9.750000e+00
5 MAGIC-hypo     0  POOF     -   1.000000 0.000000e+00
6 MAGIC-hypo     0  POOF     + 981.333333 5.717973e+05
7 MAGIC-hypo     0  WAND     -   1.000000 0.000000e+00
8 MAGIC-hypo     0  WAND     +   1.000000 1.900000e-01
9 MAGIC-hypo    12 ACTIN     -   1.000000 0.000000e+00
10 MAGIC-hypo    12 ACTIN     +  10.000000 1.600000e-01
# ... with 118 more rows

Note that this table does not have row and col (they have already been dropped) and that the replicates have been grouped and summarized by their mean (exp_mean) and variance (exp_var).


Exercise 3

Thare are two sets of qPCR reactions: one where reverse transcriptase was added to the RNA sample, and one where it was not. The rt variable reflects this by noting samples with + and -.

Make two plots of the distribution of expression values for all sample values. In one plot, use geom_histogram() and facet by rt. In the other plot, change the x-axis scale using scale_x_log10().

What do these plots tell you about the values from the rt == "-" samples?


Exercise 4

Create a plot of expression versus time for each of the MAGIC cell types. At this point you can remove the rt == "-" controls. You will need to plot expression value on a log-scale to see differences.


Exercise 5

Normalize the expression data dividing each of the MAGIC, WAND, and POOF value by the ACTIN values. You will need to use spread() to rearrange the data for calculation, and then gather() to reformat for plotting. Re-create the plots from question 4 with this normalized data. Did your interpretation change?


Exploratory analysis of biological data

Explore the brauer_gene_exp and yeast_protein_prop data sets to address your own hypothesis. Here is an example.

Hypothesis

Sulfate limitation of yeast causes downregulation of genes containing high levels of cysteine and methionine residues.

Approach

library(tidyverse)
library(pbda)
sulfurs <- yeast_prot_prop %>%
  select(ORF, Cys, Met) %>%
  group_by(ORF) %>%
  mutate(n_sulfur = sum(Cys, Met)) %>%
  ungroup() %>%
  rename(systematic_name = ORF) %>%
  select(systematic_name, n_sulfur)
library(broom)

brauer_gene_exp %>%
  select(systematic_name, nutrient, expression) %>%
  left_join(sulfurs) %>%
  select(-systematic_name) %>%
  group_by(nutrient) %>%
  nest() %>%
  mutate(
    cor = map(data, ~ tidy(cor.test(.$n_sulfur, .$expression)))
    ) %>%
  select(nutrient, cor) %>%
  unnest() %>%
  arrange(-estimate)
library(broom)
models <- brauer_gene_exp %>%
  select(systematic_name:expression) %>%
  group_by(systematic_name, nutrient) %>%
  nest() %>%
  # head() %>% 
  mutate(
    model = map(data, ~ tidy(lm(expression ~ rate, data = .x)))
  ) %>%
  select(-data)

# Now unnest and grab relevant data, i.e. genes with negative slopes
models <- models %>%
  unnest() %>%
  filter(term == "rate" & estimate < 0 & !is.na(nutrient)) %>%
  select(systematic_name, nutrient, estimate)
combined <- left_join(sulfurs, models) %>%
  select(-systematic_name) %>%
  filter(!is.na(nutrient))

library(cowplot)
ggplot(combined, aes(x = n_sulfur, y = estimate)) +
  geom_point() +
  scale_x_log10() +
  facet_grid(~nutrient)

Interpretation



rnabioco/eda documentation built on July 12, 2022, 2:17 p.m.