View source: R/plot_dispersion.R
plot_dispersion | R Documentation |
This function plots the mean-dispersion relationship for a given count matrix.
plot_dispersion( dat, trans = "log", title = "Mean-Dispersion Plot", legend = "right", ... ) ## S3 method for class 'DGEList' plot_dispersion( dat, design = NULL, trans = "log", size = NULL, alpha = NULL, title = "Mean-Dispersion Plot", legend = "right", hover = FALSE ) ## S3 method for class 'DESeqDataSet' plot_dispersion( dat, trans = "log", size = NULL, alpha = NULL, title = "Mean-Dispersion Plot", legend = "right", hover = FALSE ) ## Default S3 method: plot_dispersion( dat, design = NULL, trans = "log", pipeline = NULL, size = NULL, alpha = NULL, title = "Mean-Dispersion Plot", legend = "right", hover = FALSE )
dat |
A |
trans |
Data transformation to be applied to genewise dispersion
estimates. Must be one of |
title |
Optional plot title. |
legend |
Legend position. Must be one of |
design |
Optional design matrix with rows corresponding to samples and
columns to model coefficients. This will be extracted from |
size |
Point size. |
alpha |
Point transparency. |
hover |
Show probe name by hovering mouse over data point? If |
pipeline |
Which package should be used to estimate normalization
factors and genewise dispersions? Only relevant if |
Count data in omic studies are often presumed to follow a negative binomial
distribution, which may be uniquely identified by its mean and dispersion
parameters. For RNA-seq pipelines that rely on negative binomial generalized
linear models (GLMs), such as edgeR
and DESeq2
, estimating
genewise dispersions is therefore an essential step in the model fitting
process. Because there are rarely sufficient replicates to reliably infer
these values independently for each gene, both packages use empirical Bayes
methods to pool information across genes.
Details vary between the two pipelines, which is why outputs will differ for
DGEList
objects and DESeqDataSet
s. edgeR
begins by
computing a common dispersion parameter for the entire dataset, rendered by
plot_dispersion
as a blue horizontal line; then fits a trend line to
the maximized negative binomial likelihoods, represented by an orange curve;
and finally calculates posterior estimates, depicted by black points, using a
weighted empirical Bayes likelihood procedure. Be aware that likelihood
maximization methods for DGEList
objects vary depending on whether or
not a model matrix is supplied. See estimateDisp
for
more details. A thorough explication of the statistical theory behind this
pipeline can be found in the original papers by the packages authors:
Robinson & Smyth (2007), Robinson & Smyth (2008), and McCarthy et al. (2012).
DESeq2
also fits a trend line through likelihood estimates of genewise
dispersions, depicted by orange points in the plot_dispersion
output.
Posterior values are calculated following regression toward a log-normal
prior with mean equal to the predicted value from the trended fit and
variance equal to the difference between the observed variance of the log
dispersion estimates and the expected sampling variance. These maximum a
posteriori values are colored blue, while outliers, defined as genes with log
dispersion values more than two median absolute deviations away from the
trend line, are colored red. See estimateDispersions
for more details. For more comprehensive statistical background, see the
original DESeq paper (Anders & Huber, 2010) and the DESeq2 paper (Love et
al., 2014).
plot_dispersion
effectively combines
edgeR::plotBCV
and
DESeq2::plotDispEsts
into a single function that can
take either a DGEList
or a DESeqDataSet
as its argument and
return the matching figure. By default, dispersions are plotted under log10
transform. They may also be displayed under square root transform, in which
case the y-axis can be interpreted as the biological coefficient of variation
(McCarthy et al., 2012).
Anders, S. & Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106.
McCarthy, D.J., Chen, Y., & Smyth, G.K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res., 40(10): 4288-4297.
Love, M., Huber, W. & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12): 550.
Robinson, M.D. and Smyth, G.K. (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 9(2):321-332.
Robinson, M.D. & Smyth, G.K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23(21): 2881-2887.
plotDispEsts, plotBCV,
estimateDispersions, estimateDisp
library(DESeq2) dds <- makeExampleDESeqDataSet() plot_dispersion(dds) # Plot the same data using the edgeR pipeline and sqrt transform plot_dispersion(counts(dds), design = model.matrix(~ condition, colData(dds)), pipeline = "edgeR", trans = "sqrt")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.