QL.fit: Fit quasi-likelihood models to matrix of RNA-seq expression...
In QuasiSeq: Analyzing RNA Sequencing Count Tables Using Quasi-Likelihood

View source: R/QL.fit.R

QL.fit

R Documentation

Fit quasi-likelihood models to matrix of RNA-seq expression count data

Description

Fit a quasi-likelihood model to RNA-seq expression count data using the methods detailed in Lund, Nettleton, McCarthy, and Smyth (2012).

Usage

QL.fit(counts, design.list, test.mat=cbind(1L, 2:length(design.list)), 
	   log.offset=NULL, Model="NegBin", print.progress=TRUE, NBdisp="trend", 
	   bias.fold.tolerance=1.10, ...)

Arguments

`counts`	RNA-seq data matrix of integer expression counts. Each row contains observations from a single gene. Each column contains observations from a single experimental unit.
`design.list`	List of design matrices for the full model and reduced model(s). The first element of `design.list` must describe the overall full model, as this design is used to compute deviance residuals for estimating dispersion. One-factor designs may be specified as vectors. The number of rows in each design matrix (or the length of each design vector) must be `ncol(counts)`. Means are modeled with a log link function.
`test.mat`	T by 2 matrix dictating which designs are to be compared, where T is the total number of desired hypothesis tests for each gene. Each row contains two integers, which provide the indices within `design.list` of the two designs to be compared. If `test.mat` is not specified, the default is compare the first design (the full model) to each of the other designs provided in `design.list` in order (i.e. first design compared to second design, first design compared to third design, first design compared to fourth design, etc.). If `length(design.list)=1`, then `test.mat` is ignored.
`log.offset`	A vector of log-scale, additive factors used to adjust estimated log-scale means for differences in library sizes across samples. Commonly used offsets include `log.offset=log(colSums(counts))` and `log.offset=log(apply(counts[rowSums(counts)!=0,],2,quantile,.75))`. The default setting makes no adjustment for library sizes (i.e. log.offset=0).
`Model`	Must be one of "Poisson" or "NegBin", specifying use of a quasi-Poisson or quasi-negative binomial model, respectively.
`print.progress`	logical. If `TRUE`, a text progress bar will be displayed during the fitting of each models in the `design.list`.
`NBdisp`	Used only when `Model="NegBin"`. Must be one of "trend", "common" or a vector of non-negative real numbers with length equal to `nrow(counts)`. Specifying `NBdisp="trend"` or `NBdisp="common"` will use estimateGLMTrendedDisp or `estimateGLMCommonDisp`, respectively, from the package `edgeR` to estimate negative binomial dispersion parameters for each gene. Estimates obtained from other sources can be used by entering `NBdisp` as a vector containing the negative binomial dispersion value to use for each gene when fitting quasi-likelihood model.
`bias.fold.tolerance`	A numerical value no smaller than 1. If the bias reduction of maximum likelihood estimates of (log) fold change is likely to result in a ratio of fold changes greater than this value, then bias reduction will be performed on such genes. Setting `bias.fold.tolerance=Inf` will completely disable bias reduction; setting `bias.fold.tolerance=1` will always perform bias reduction. See `NBDev` for details. Currently, this option is ignored unless `Model="NegBin"`.
`...`	arguments to be passed to the function `estimateGLMTrendedDisp` or `estimateGLMCommonDisp` from the package `edgeR`.

Value

list containing:

`"LRT"`	matrix providing unadjusted likelihood ratio test statistics. Each column contains statistics from a single hypothesis test, applied separately to each gene.
`"LRT.Bart"`	The same as `LRT`, except that the LRT test statistics are adjusted by the Bartlett factor to better match their asymptotic moments.
`"phi.hat.dev"`	vector providing unshrunken, deviance-based estimates of quasi-dispersion (phi) for each gene.
`"phi.hat.pearson"`	vector providing unshrunken, Pearson estimates of quasi-dispersion (phi) for each gene.
`"mn.cnt"`	vector providing average count for each gene.
`"den.df"`	denominator degrees of freedom. Equal to the number of samples minus the number of fitted parameters in the full model, which is specified by the first element of `design.list`
`"num.df"`	vector of numerator degrees of freedom for each test, computed as the difference in the number of fitted parameters between the full and reduced models for each test.
`"Model"`	Either "Poisson" or "NegBin", specifying which model (quasi-Poisson or quasi-negative binomial, respectively) was used.
`"nb.disp"`	Only appears when `Model="NegBin"`. Vector providing negative binomial dispersion parameter estimate used during fitting of quasi-negative binomial model for each gene.
`fitted.values`	matrix of fitted mean values
`coefficients`	matrix of estimated coefficients for each gene. Note that these are given on the log scale. (i.e. intercept coefficient reports log(average count) and non-intercept coefficients report estimated (and bias reduced, when appropriate) log fold-changes.)

Author(s)

Originally authored by Steve Lund lundsp@gmail.com; later modified by Long Qu rtistician@gmail.com

References

Kosmidis and Firth (2009) "Bias reduction in exponential family nonlinear models" Biometrika, 96, 793–804.

Lund, Nettleton, McCarthy and Smyth (2012) "Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates" SAGMB, 11(5).

McCarthy, Chen and Smyth (2012) "Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation" Nucleic Acids Res. 40(10), 4288–97.

Cordeiro (1983) "Improved Likelihood Ratio Statistics for Generalized Linear Models" Journal of the Royal Statistical Society. Series B (Methodological), 45, 404–413.

Examples

 
data(mockRNASeqData)

### Create list of designs describing model under null and alternative hypotheses
design.list=list(
	mockRNASeqData$design.matrix, 
	matrix(1, nrow=rep(mockRNASeqData$nsamples))
)

## Not run: 
### Analyze using QL, QLShrink and QLSpline methods applied to quasi-Poisson model
fit = 
with(mockRNASeqData,
	QL.fit(counts, design.list,
	log.offset=log(estimated.normalization),
	 Model="Poisson", bias.fold.tolerance=Inf)
)
results = QL.results(fit)

### How many significant genes at FDR=.05 from QLSpline method?
apply(results$Q.values[[3]]<.05,2,sum)

### Indexes for Top 10 most significant genes from QLSpline method
head(order(results$P.values[[3]]), 10)

## End(Not run)

## Not run: 
### Analyze using QL, QLShrink and QLSpline methods 
### applied to quasi-negative binomial model
fit2 = 
with(mockRNASeqData,
	QL.fit(counts, design.list,
		log.offset=log(estimated.normalization),
		Model="NegBin")
)
##########################################################
## Note: 'NBdisp' typically will not be specified when  ##
## calling QL.fit while analyzing real data. Providing  ##
## numeric values for 'NBdisp' prevents neg binomial    ##
## dispersions from being estimated from the data.      ##
##########################################################
results2<-QL.results(fit2)

### How many significant genes at FDR=.05 for QLSpline method?
apply(results2$Q.values[[3]]<.05,2,sum)

### Indexes for Top 10 most significant genes from QLShrink method
head(order(results2$P.values[[2]]), 10)


## End(Not run)

QuasiSeq documentation built on Aug. 15, 2022, 5:07 p.m.