dispBinTrend: Estimate Dispersion Trend by Binning for NB GLMs
In hiraksarkar/edgeR_fork: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Author(s) References See Also Examples

Estimate the abundance-dispersion trend by computing the common dispersion for bins of genes of similar AveLogCPM and then fitting a smooth curve.

1
2
3

dispBinTrend(y, design=NULL, offset=NULL, df = 5, span=0.3, min.n=400,
             method.bin="CoxReid", method.trend="spline", AveLogCPM=NULL,
             weights=NULL, ...)

`y`	numeric matrix of counts
`design`	numeric matrix giving the design matrix for the GLM that is to be fit.
`offset`	numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value will be used as an offset for all genes and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each gene. If a matrix, then each library for each gene can have a unique offset, if desired. In `adjustedProfileLik` the `offset` must be a matrix with the same dimension as the table of counts.
`df`	degrees of freedom for spline curve.
`span`	span used for loess curve.
`min.n`	minimim number of genes in a bins.
`method.bin`	method used to estimate the dispersion in each bin. Possible values are `"CoxReid"`, `"Pearson"` or `"deviance"`.
`method.trend`	type of curve to smooth the bins. Possible values are `"spline"` for a natural cubic regression spline or `"loess"` for a linear lowess curve.
`AveLogCPM`	numeric vector giving average log2 counts per million for each gene
`weights`	optional numeric matrix giving observation weights
`...`	other arguments are passed to `estimateGLMCommonDisp`

Estimate a dispersion parameter for each of many negative binomial generalized linear models by computing the common dispersion for genes sorted into bins based on overall AveLogCPM. A regression natural cubic splines or a linear loess curve is used to smooth the trend and extrapolate a value to each gene.

If there are fewer than min.n rows of y with at least one positive count, then one bin is used. The number of bins is limited to 1000.

list with the following components:

`AveLogCPM`	numeric vector containing the overall AveLogCPM for each gene
`dispersion`	numeric vector giving the trended dispersion estimate for each gene
`bin.AveLogCPM`	numeric vector of length equal to `nbins` giving the average (mean) AveLogCPM for each bin
`bin.dispersion`	numeric vector of length equal to `nbins` giving the estimated common dispersion for each bin

Davis McCarthy and Gordon Smyth

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. doi: 10.1093/nar/gks042

estimateGLMTrendedDisp

ngenes <- 1000
nlibs <- 4
means <- seq(5,10000,length.out=ngenes)
y <- matrix(rnbinom(ngenes*nlibs,mu=rep(means,nlibs),size=0.1*means),nrow=ngenes,ncol=nlibs)
keep <- rowSums(y) > 0
y <- y[keep,]
group <- factor(c(1,1,2,2))
design <- model.matrix(~group) # Define the design matrix for the full model
out <- dispBinTrend(y, design, min.n=100, span=0.3)
with(out, plot(AveLogCPM, sqrt(dispersion)))