knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The r Biocpkg("HighlyReplicatedRNASeq") package provides functions to access the count matrix from bulk RNA-seq studies with many replicates. For example,the study from Schurch et al. (2016) has data on 86 samples of S. cerevisiae in two conditions.

Load Data

To load the dataset, call the Schurch16() function. It returns a r Biocpkg("SummarizedExperiment"):

schurch_se <- HighlyReplicatedRNASeq::Schurch16()

schurch_se

An alternative approach that achieves exactly the same is to load the data directly from ExperimentHub

library(ExperimentHub)
eh <- ExperimentHub()
records <- query(eh, "HighlyReplicatedRNASeq")
records[1]           ## display the metadata for the first resource
count_matrix <- records[["EH3315"]]  ## load the count matrix by ID
count_matrix[1:10, 1:5]

Explore Data

It has 7126 genes and 86 samples. The counts are between 0 and 467,000.

summary(c(assay(schurch_se, "counts")))

To make the data easier to work with, I will "normalize" the data. First I divide it by the mean of each sample to account for the differential sequencing depth. Then, I apply the log() transformation and add a small number to avoid taking the logarithm of 0.

norm_counts <- assay(schurch_se, "counts")
norm_counts <- log(norm_counts / colMeans(norm_counts) + 0.001)

The histogram of the transformed data looks very smooth:

hist(norm_counts, breaks = 100)

Finally, let us take a look at the MA-plot of the data and the volcano plot:

wt_mean <- rowMeans(norm_counts[, schurch_se$condition == "wildtype"])
ko_mean <- rowMeans(norm_counts[, schurch_se$condition == "knockout"])

plot((wt_mean+ ko_mean) / 2, wt_mean - ko_mean,
     pch = 16, cex = 0.4, col = "#00000050", frame.plot = FALSE)
abline(h = 0)

pvalues <- sapply(seq_len(nrow(norm_counts)), function(idx){
  tryCatch(
    t.test(norm_counts[idx, schurch_se$condition == "wildtype"], 
         norm_counts[idx, schurch_se$condition == "knockout"])$p.value,
  error = function(err) NA)
})

plot(wt_mean - ko_mean, - log10(pvalues),
     pch = 16, cex = 0.5, col = "#00000050", frame.plot = FALSE)

Reference

Session Info

sessionInfo()


const-ae/HighlyReplicatedRNASeq documentation built on Jan. 19, 2021, 12:46 a.m.