plot_dispersion: Plot Dispersion Estimates
In dswatson/bioplotr: Pretty, simple, optionally interactive plots for bioinformatics analysis pipelines

plot_dispersion

R Documentation

Plot Dispersion Estimates

Description

This function plots the mean-dispersion relationship for a given count matrix.

Usage

plot_dispersion(
  dat,
  trans = "log",
  title = "Mean-Dispersion Plot",
  legend = "right",
  ...
)

## S3 method for class 'DGEList'
plot_dispersion(
  dat,
  design = NULL,
  trans = "log",
  size = NULL,
  alpha = NULL,
  title = "Mean-Dispersion Plot",
  legend = "right",
  hover = FALSE
)

## S3 method for class 'DESeqDataSet'
plot_dispersion(
  dat,
  trans = "log",
  size = NULL,
  alpha = NULL,
  title = "Mean-Dispersion Plot",
  legend = "right",
  hover = FALSE
)

## Default S3 method:
plot_dispersion(
  dat,
  design = NULL,
  trans = "log",
  pipeline = NULL,
  size = NULL,
  alpha = NULL,
  title = "Mean-Dispersion Plot",
  legend = "right",
  hover = FALSE
)

Arguments

`dat`	A `DGEList` object or `DESeqDataSet`. If normalization factors or dispersions have not already been estimated for `dat`, they will be internally computed using the appropriate edgeR or DESeq2 functions. Alternatively, `dat` may be a raw count matrix, in which case the `pipeline` argument must specify which package to use for estimating normalization factors and genewise dispersions. A design matrix may be required to calculate adjusted profile log-likelihoods. See Details.
`trans`	Data transformation to be applied to genewise dispersion estimates. Must be one of `"log"` or `"sqrt"`.
`title`	Optional plot title.
`legend`	Legend position. Must be one of `"bottom"`, `"left"`, `"top"`, `"right"`, `"bottomright"`, `"bottomleft"`, `"topleft"`, or `"topright"`.
`design`	Optional design matrix with rows corresponding to samples and columns to model coefficients. This will be extracted from `dat` if available and need not be set explicitly in the call. If provided, however, `design` will override the relevant slot of `dat`.
`size`	Point size.
`alpha`	Point transparency.
`hover`	Show probe name by hovering mouse over data point? If `TRUE`, the plot is rendered in HTML and will either open in your browser's graphic display or appear in the RStudio viewer.
`pipeline`	Which package should be used to estimate normalization factors and genewise dispersions? Only relevant if `dat` is a raw count matrix. Must be one of `"edgeR"` or `"DESeq2"`. Default settings are applied at all steps. For greater control of internal parameters, create the appropriate `DGEList` or `DESeqDataSet` object with your preferred settings and pass it directly as `dat`.

Details

Count data in omic studies are often presumed to follow a negative binomial distribution, which may be uniquely identified by its mean and dispersion parameters. For RNA-seq pipelines that rely on negative binomial generalized linear models (GLMs), such as edgeR and DESeq2, estimating genewise dispersions is therefore an essential step in the model fitting process. Because there are rarely sufficient replicates to reliably infer these values independently for each gene, both packages use empirical Bayes methods to pool information across genes.

Details vary between the two pipelines, which is why outputs will differ for DGEList objects and DESeqDataSets. edgeR begins by computing a common dispersion parameter for the entire dataset, rendered by plot_dispersion as a blue horizontal line; then fits a trend line to the maximized negative binomial likelihoods, represented by an orange curve; and finally calculates posterior estimates, depicted by black points, using a weighted empirical Bayes likelihood procedure. Be aware that likelihood maximization methods for DGEList objects vary depending on whether or not a model matrix is supplied. See estimateDisp for more details. A thorough explication of the statistical theory behind this pipeline can be found in the original papers by the packages authors: Robinson & Smyth (2007), Robinson & Smyth (2008), and McCarthy et al. (2012).

DESeq2 also fits a trend line through likelihood estimates of genewise dispersions, depicted by orange points in the plot_dispersion output. Posterior values are calculated following regression toward a log-normal prior with mean equal to the predicted value from the trended fit and variance equal to the difference between the observed variance of the log dispersion estimates and the expected sampling variance. These maximum a posteriori values are colored blue, while outliers, defined as genes with log dispersion values more than two median absolute deviations away from the trend line, are colored red. See estimateDispersions for more details. For more comprehensive statistical background, see the original DESeq paper (Anders & Huber, 2010) and the DESeq2 paper (Love et al., 2014).

plot_dispersion effectively combines edgeR::plotBCV and DESeq2::plotDispEsts into a single function that can take either a DGEList or a DESeqDataSet as its argument and return the matching figure. By default, dispersions are plotted under log10 transform. They may also be displayed under square root transform, in which case the y-axis can be interpreted as the biological coefficient of variation (McCarthy et al., 2012).

References

Anders, S. & Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106.

McCarthy, D.J., Chen, Y., & Smyth, G.K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res., 40(10): 4288-4297.

Love, M., Huber, W. & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12): 550.

Robinson, M.D. and Smyth, G.K. (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 9(2):321-332.

Robinson, M.D. & Smyth, G.K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23(21): 2881-2887.

Examples

library(DESeq2)
dds <- makeExampleDESeqDataSet()
plot_dispersion(dds)

# Plot the same data using the edgeR pipeline and sqrt transform
plot_dispersion(counts(dds), design = model.matrix(~ condition, colData(dds)),
                pipeline = "edgeR", trans = "sqrt")

dswatson/bioplotr documentation built on March 3, 2023, 9:43 p.m.

dswatson/bioplotr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dswatson/bioplotr
Pretty, simple, optionally interactive plots for bioinformatics analysis pipelines

plot_dispersion: Plot Dispersion Estimates
In dswatson/bioplotr: Pretty, simple, optionally interactive plots for bioinformatics analysis pipelines

Plot Dispersion Estimates

Description

Usage

Arguments

Details

References

See Also

Examples

Related to plot_dispersion in dswatson/bioplotr...

R Package Documentation

Browse R Packages

We want your feedback!

dswatson/bioplotr Pretty, simple, optionally interactive plots for bioinformatics analysis pipelines

plot_dispersion: Plot Dispersion Estimates In dswatson/bioplotr: Pretty, simple, optionally interactive plots for bioinformatics analysis pipelines

Plot Dispersion Estimates

Description

Usage

Arguments

Details

References

See Also

Examples

Related to plot_dispersion in dswatson/bioplotr...

R Package Documentation

Browse R Packages

We want your feedback!

dswatson/bioplotr
Pretty, simple, optionally interactive plots for bioinformatics analysis pipelines

plot_dispersion: Plot Dispersion Estimates
In dswatson/bioplotr: Pretty, simple, optionally interactive plots for bioinformatics analysis pipelines