estimateDisp: Estimate Common, Trended and Tagwise Negative Binomial...
In hiraksarkar/edgeR_fork: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Maximizes the negative binomial likelihood to give the estimate of the common, trended and tagwise dispersions across all tags.

## S3 method for class 'DGEList'
estimateDisp(y, design=NULL, prior.df=NULL, trend.method="locfit", tagwise=TRUE,
            span=NULL, min.row.sum=5, grid.length=21, grid.range=c(-10,10), robust=FALSE, 
            winsor.tail.p=c(0.05,0.1), tol=1e-06, ...)
## S3 method for class 'SummarizedExperiment'
estimateDisp(y, design=NULL, prior.df=NULL, trend.method="locfit", tagwise=TRUE,
            span=NULL, min.row.sum=5, grid.length=21, grid.range=c(-10,10), robust=FALSE, 
            winsor.tail.p=c(0.05,0.1), tol=1e-06, ...)
## Default S3 method:
estimateDisp(y, design=NULL, group=NULL, lib.size=NULL, offset=NULL, prior.df=NULL,
            trend.method="locfit", tagwise=TRUE, span=NULL, min.row.sum=5, grid.length=21, 
            grid.range=c(-10,10), robust=FALSE, winsor.tail.p=c(0.05,0.1), tol=1e-06, weights=NULL, ...)

`y`	matrix of counts, or a `DGEList` object, or a `SummarizedExperiment` object.
`design`	numeric design matrix. Defaults to `model.matrix(~group)` if `group` is specified and otherwise to a single column of ones.
`prior.df`	prior degrees of freedom. It is used in calculating `prior.n`.
`trend.method`	method for estimating dispersion trend. Possible values are `"locfit"` (default), `"none"`, `"movingave"`, `"loess"` and `"locfit.mixed"`, which uses a polynomial of degree 1 for lowly expressed genes.
`tagwise`	logical, should the tagwise dispersions be estimated?
`span`	width of the smoothing window, as a proportion of the data set.
`min.row.sum`	numeric scalar giving a value for the filtering out of low abundance tags. Only tags with total sum of counts above this value are used. Low abundance tags can adversely affect the dispersion estimation, so this argument allows the user to select an appropriate filter threshold for the tag abundance.
`grid.length`	the number of points on which the interpolation is applied for each tag.
`grid.range`	the range of the grid points around the trend on a log2 scale.
`robust`	logical, should the estimation of `prior.df` be robustified against hypervariable genes?
`winsor.tail.p`	numeric vector of length 1 or 2, giving left and right tail proportions of the deviances to Winsorize when estimating `prior.df`.
`tol`	the desired accuracy, passed to `optimize`
`group`	vector or factor giving the experimental group/condition for each library. Defaults to a vector of ones with length equal to the number of libraries.
`lib.size`	numeric vector giving the total count (sequence depth) for each library.
`offset`	offset matrix for the log-linear model, as for `glmFit`. Defaults to the log-effective library sizes.
`weights`	optional numeric matrix giving observation weights
`...`	other arguments that are not currently used.

This function calculates a matrix of likelihoods for each tag at a set of dispersion grid points, and then applies weighted likelihood empirical Bayes method to obtain posterior dispersion estimates. If there is no design matrix, it calculates the quantile conditional likelihood for each tag and then maximizes it. In this case, it is similar to the function estimateCommonDisp and estimateTagwiseDisp. If a design matrix is given, it calculates the adjusted profile log-likelihood for each tag and then maximizes it. In this case, it is similar to the functions estimateGLMCommonDisp, estimateGLMTrendedDisp and estimateGLMTagwiseDisp.

Note that the terms ‘tag’ and ‘gene’ are synonymous here.

estimateDisp.DGEList adds the following components to the input DGEList object:

`design`	the design matrix.
`common.dispersion`	estimate of the common dispersion.
`trended.dispersion`	estimates of the trended dispersions.
`tagwise.dispersion`	tagwise estimates of the dispersion parameter if `tagwise=TRUE`.
`AveLogCPM`	numeric vector giving log2(AveCPM) for each row of `y`.
`trend.method`	method for estimating dispersion trend as given in the input.
`prior.df`	prior degrees of freedom. If `robust=TRUE` then `prior.df` is a vector with smaller values assigned to hypervariable outlier genes.
`prior.n`	estimate of the prior weight, i.e. the smoothing parameter that indicates the weight to put on the common likelihood compared to the individual tag's likelihood.
`span`	width of the smoothing window used in estimating dispersions.

estimateDisp.SummarizedExperiment converts the input SummarizedExperiment object into a DGEList object, and then calls estimateDisp.DGEList. The output is a DGEList object.

estimateDisp.default returns a list containing common.dispersion, trended.dispersion, tagwise.dispersion (if tagwise=TRUE), span, prior.df and prior.n.

The estimateDisp function doesn't give exactly the same estimates as the traditional calling sequences.

Yunshun Chen, Gordon Smyth

Chen, Y, Lun, ATL, and Smyth, GK (2014). Differential expression analysis of complex RNA-seq experiments using edgeR. In: Statistical Analysis of Next Generation Sequence Data, Somnath Datta and Daniel S. Nettleton (eds), Springer, New York, pages 51-74. http://www.statsci.org/smyth/pubs/edgeRChapterPreprint.pdf

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. doi: 10.1214/16-AOAS920

estimateCommonDisp, estimateTagwiseDisp, estimateGLMCommonDisp, estimateGLMTrendedDisp, estimateGLMTagwiseDisp

# True dispersion is 1/5=0.2
y <- matrix(rnbinom(1000, mu=10, size=5), ncol=4)
group <- factor(c(1,1,2,2))
design <- model.matrix(~group)
d <- DGEList(counts=y, group=group)
d1 <- estimateDisp(d)
d2 <- estimateDisp(d, design)