plot_dispersion: Plot Dispersion Estimates

View source: R/plot_dispersion.R

plot_dispersionR Documentation

Plot Dispersion Estimates

Description

This function plots the mean-dispersion relationship for a given count matrix.

Usage

plot_dispersion(
  dat,
  trans = "log",
  title = "Mean-Dispersion Plot",
  legend = "right",
  ...
)

## S3 method for class 'DGEList'
plot_dispersion(
  dat,
  design = NULL,
  trans = "log",
  size = NULL,
  alpha = NULL,
  title = "Mean-Dispersion Plot",
  legend = "right",
  hover = FALSE
)

## S3 method for class 'DESeqDataSet'
plot_dispersion(
  dat,
  trans = "log",
  size = NULL,
  alpha = NULL,
  title = "Mean-Dispersion Plot",
  legend = "right",
  hover = FALSE
)

## Default S3 method:
plot_dispersion(
  dat,
  design = NULL,
  trans = "log",
  pipeline = NULL,
  size = NULL,
  alpha = NULL,
  title = "Mean-Dispersion Plot",
  legend = "right",
  hover = FALSE
)

Arguments

dat

A DGEList object or DESeqDataSet. If normalization factors or dispersions have not already been estimated for dat, they will be internally computed using the appropriate edgeR or DESeq2 functions. Alternatively, dat may be a raw count matrix, in which case the pipeline argument must specify which package to use for estimating normalization factors and genewise dispersions. A design matrix may be required to calculate adjusted profile log-likelihoods. See Details.

trans

Data transformation to be applied to genewise dispersion estimates. Must be one of "log" or "sqrt".

title

Optional plot title.

legend

Legend position. Must be one of "bottom", "left", "top", "right", "bottomright", "bottomleft", "topleft", or "topright".

design

Optional design matrix with rows corresponding to samples and columns to model coefficients. This will be extracted from dat if available and need not be set explicitly in the call. If provided, however, design will override the relevant slot of dat.

size

Point size.

alpha

Point transparency.

hover

Show probe name by hovering mouse over data point? If TRUE, the plot is rendered in HTML and will either open in your browser's graphic display or appear in the RStudio viewer.

pipeline

Which package should be used to estimate normalization factors and genewise dispersions? Only relevant if dat is a raw count matrix. Must be one of "edgeR" or "DESeq2". Default settings are applied at all steps. For greater control of internal parameters, create the appropriate DGEList or DESeqDataSet object with your preferred settings and pass it directly as dat.

Details

Count data in omic studies are often presumed to follow a negative binomial distribution, which may be uniquely identified by its mean and dispersion parameters. For RNA-seq pipelines that rely on negative binomial generalized linear models (GLMs), such as edgeR and DESeq2, estimating genewise dispersions is therefore an essential step in the model fitting process. Because there are rarely sufficient replicates to reliably infer these values independently for each gene, both packages use empirical Bayes methods to pool information across genes.

Details vary between the two pipelines, which is why outputs will differ for DGEList objects and DESeqDataSets. edgeR begins by computing a common dispersion parameter for the entire dataset, rendered by plot_dispersion as a blue horizontal line; then fits a trend line to the maximized negative binomial likelihoods, represented by an orange curve; and finally calculates posterior estimates, depicted by black points, using a weighted empirical Bayes likelihood procedure. Be aware that likelihood maximization methods for DGEList objects vary depending on whether or not a model matrix is supplied. See estimateDisp for more details. A thorough explication of the statistical theory behind this pipeline can be found in the original papers by the packages authors: Robinson & Smyth (2007), Robinson & Smyth (2008), and McCarthy et al. (2012).

DESeq2 also fits a trend line through likelihood estimates of genewise dispersions, depicted by orange points in the plot_dispersion output. Posterior values are calculated following regression toward a log-normal prior with mean equal to the predicted value from the trended fit and variance equal to the difference between the observed variance of the log dispersion estimates and the expected sampling variance. These maximum a posteriori values are colored blue, while outliers, defined as genes with log dispersion values more than two median absolute deviations away from the trend line, are colored red. See estimateDispersions for more details. For more comprehensive statistical background, see the original DESeq paper (Anders & Huber, 2010) and the DESeq2 paper (Love et al., 2014).

plot_dispersion effectively combines edgeR::plotBCV and DESeq2::plotDispEsts into a single function that can take either a DGEList or a DESeqDataSet as its argument and return the matching figure. By default, dispersions are plotted under log10 transform. They may also be displayed under square root transform, in which case the y-axis can be interpreted as the biological coefficient of variation (McCarthy et al., 2012).

References

Anders, S. & Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106.

McCarthy, D.J., Chen, Y., & Smyth, G.K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res., 40(10): 4288-4297.

Love, M., Huber, W. & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12): 550.

Robinson, M.D. and Smyth, G.K. (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 9(2):321-332.

Robinson, M.D. & Smyth, G.K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23(21): 2881-2887.

See Also

plotDispEsts, plotBCV, estimateDispersions, estimateDisp

Examples

library(DESeq2)
dds <- makeExampleDESeqDataSet()
plot_dispersion(dds)

# Plot the same data using the edgeR pipeline and sqrt transform
plot_dispersion(counts(dds), design = model.matrix(~ condition, colData(dds)),
                pipeline = "edgeR", trans = "sqrt")


dswatson/bioplotr documentation built on March 3, 2023, 9:43 p.m.