estimateGLMRobustDisp: Empirical Robust Bayes Tagwise Dispersions for Negative...
In hiraksarkar/edgeR_fork: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Author(s) References See Also Examples

Compute a robust estimate of the negative binomial dispersion parameter for each gene, with expression levels specified by a log-linear model, using observation weights. These observation weights will be stored and used later for estimating regression parameters.

estimateGLMRobustDisp(y, design = NULL, prior.df = 10, update.trend = TRUE,
                      trend.method = "bin.loess", maxit = 6, k = 1.345,
                      residual.type = "pearson", verbose = FALSE,
                      record = FALSE)

`y`	a `DGEList` object.
`design`	numeric design matrix, as for `glmFit`.
`prior.df`	prior degrees of freedom.
`update.trend`	logical. Should the trended dispersion be re-estimated at each iteration?
`trend.method`	method (low-level function) used to estimated the trended dispersions. `estimateGLMTrendedDisp`
`maxit`	maximum number of iterations for weighted `estimateGLMTagwiseDisp`.
`k`	the tuning constant for Huber estimator. If the absolute value of residual (r) is less than k, its observation weight is 1, otherwise `k/abs(r)`.
`residual.type`	type of residual (r) used for estimation observation weight
`verbose`	logical. Should verbose comments be printed?
`record`	logical. Should information for each iteration be recorded (and returned as a list)?

Moderation of dispersion estimates towards a trend can be sensitive to outliers, resulting in an increase in false positives. That is, since the dispersion estimates are moderated downwards toward the trend and because the regression parameter estimates may be affected by the outliers, some genes are incorrectly deemed to be significantly differentially expressed. This function uses an iterative procedure where weights are calculated from residuals and estimates are made after re-weighting.

The robustly computed genewise estimates are reported in the tagwise.dispersion vector of the returned DGEList. The terms ‘tag’ and ‘gene’ are synonymous in this context.

Note: it is not necessary to first calculate the common, trended and genewise dispersion estimates. If these are not available, the function will first calculate this (in an unweighted) fashion.

estimateGLMRobustDisp produces a DGEList object, which contains the (robust) genewise dispersion parameter estimate for each gene for the negative binomial model that maximizes the weighted Cox-Reid adjusted profile likelihood, as well as the observation weights. The observation weights are calculated using residuals and the Huber function.

Note that when record=TRUE, a simple list of DGEList objects is returned, one for each iteration (this is for debugging or tracking purposes).

Xiaobei Zhou, Mark D. Robinson

Zhou X, Lindsay H, Robinson MD (2014). Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Research, 42(11), e91.

This function calls estimateGLMTrendedDisp and estimateGLMTagwiseDisp.

y <- matrix(rnbinom(100*6,mu=10,size=1/0.1),ncol=6)
d <- DGEList(counts=y,group=c(1,1,1,2,2,2),lib.size=c(1000:1005))
d <- calcNormFactors(d)
design <- model.matrix(~group, data=d$samples) # Define the design matrix for the full model
d <- estimateGLMRobustDisp(d, design)
summary(d$tagwise.dispersion)