dispBinTrend: Estimate Dispersion Trend by Binning for NB GLMs

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/dispBinTrend.R

Description

Estimate the abundance-dispersion trend by computing the common dispersion for bins of genes of similar AveLogCPM and then fitting a smooth curve.

Usage

1
2
3
dispBinTrend(y, design=NULL, offset=NULL, df = 5, span=0.3, min.n=400,
             method.bin="CoxReid", method.trend="spline", AveLogCPM=NULL,
             weights=NULL, ...)

Arguments

y

numeric matrix of counts

design

numeric matrix giving the design matrix for the GLM that is to be fit.

offset

numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value will be used as an offset for all genes and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each gene. If a matrix, then each library for each gene can have a unique offset, if desired. In adjustedProfileLik the offset must be a matrix with the same dimension as the table of counts.

df

degrees of freedom for spline curve.

span

span used for loess curve.

min.n

minimim number of genes in a bins.

method.bin

method used to estimate the dispersion in each bin. Possible values are "CoxReid", "Pearson" or "deviance".

method.trend

type of curve to smooth the bins. Possible values are "spline" for a natural cubic regression spline or "loess" for a linear lowess curve.

AveLogCPM

numeric vector giving average log2 counts per million for each gene

weights

optional numeric matrix giving observation weights

...

other arguments are passed to estimateGLMCommonDisp

Details

Estimate a dispersion parameter for each of many negative binomial generalized linear models by computing the common dispersion for genes sorted into bins based on overall AveLogCPM. A regression natural cubic splines or a linear loess curve is used to smooth the trend and extrapolate a value to each gene.

If there are fewer than min.n rows of y with at least one positive count, then one bin is used. The number of bins is limited to 1000.

Value

list with the following components:

AveLogCPM

numeric vector containing the overall AveLogCPM for each gene

dispersion

numeric vector giving the trended dispersion estimate for each gene

bin.AveLogCPM

numeric vector of length equal to nbins giving the average (mean) AveLogCPM for each bin

bin.dispersion

numeric vector of length equal to nbins giving the estimated common dispersion for each bin

Author(s)

Davis McCarthy and Gordon Smyth

References

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. doi: 10.1093/nar/gks042

See Also

estimateGLMTrendedDisp

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
ngenes <- 1000
nlibs <- 4
means <- seq(5,10000,length.out=ngenes)
y <- matrix(rnbinom(ngenes*nlibs,mu=rep(means,nlibs),size=0.1*means),nrow=ngenes,ncol=nlibs)
keep <- rowSums(y) > 0
y <- y[keep,]
group <- factor(c(1,1,2,2))
design <- model.matrix(~group) # Define the design matrix for the full model
out <- dispBinTrend(y, design, min.n=100, span=0.3)
with(out, plot(AveLogCPM, sqrt(dispersion)))

hiraksarkar/edgeR_fork documentation built on Dec. 20, 2021, 3:52 p.m.