In StoreyLab/superSeq: Determining sufficient read depth in RNA-Seq experiments

knitr::opts_chunk$set(echo = TRUE)

The superSeq package can help determine whether a completed study has sufficient read depth to achieve desired statistical power. Using superSeq, we can predict the increase in statistical power from an increasing read depth (given a fixed number of biological replicates). For more details, see our recent preprint.

Quick start guide

The superSeq package predicts the relationship between statistical power and read depth using subsampling data. Therefore, we need to apply a subsampling methodology to the experiment and then fit our model using the superSeq function. In this package, we use the computationally efficient subSeq package.

As an example, let's first load the package and apply the subsampling function subsample to the bottomly data set:

library(superSeq)
library(subSeq)
library(Biobase)
data(bottomly)

# Extract count matrix, experimental design and filter low count genes
bottomly_counts <- exprs(bottomly)
bottomly_design <- pData(bottomly)
bottomly_counts <- bottomly_counts[rowSums(bottomly_counts) >= 10, ]

# Generate the subsampling data for this study at specified proportions
bottomly_proportions <- 10 ^ seq(-2, 0, 0.1)
ss = subsample(counts = bottomly_counts,
               proportions = bottomly_proportions,
               treatment = bottomly_design$strain, 
               method = c("voomLimma"),
               replications = 3,
               seed = 12345)
ss_sum <- summary(ss)
head(ss_sum)

Type ?subSeq for additional details on the subsampling implementation. Now that we have the subsampling results, we can apply the superSeq function and view the predictions from the model using the plot function:

ss_obj <- superSeq(ss_sum)
plot(ss_obj)

It is evident from the above plot that the study is undersaturated, i.e., the study can expect a substantial increase in statistical power from sequencing additional reads. We can extract a summary of our predictions as follows:

summary(ss_obj)

The estimated asymptotic number of discoveries is the expected number of differentially expressed genes when the technical variability is minimized. Thus the current read depth provides 53.3\% of the total power and doubling or tripling the read depth will provide a 14.3\% or 21.8\% increase in power, respectively.

StoreyLab/superSeq documentation built on June 4, 2019, 7:47 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com