glmTreat: Test for Differential Expression Relative to a Threshold
In hiraksarkar/edgeR_fork: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Conduct genewise statistical tests for a given coefficient or contrast relative to a specified fold-change threshold.

1 2	glmTreat(glmfit, coef = ncol(glmfit$design), contrast = NULL, lfc = log2(1.2), null = "interval")

`glmfit`	a `DGEGLM` object, usually output from `glmFit` or `glmQLFit`.
`coef`	integer or character vector indicating which coefficients of the linear model are to be tested equal to zero. Values must be columns or column names of `design`. Defaults to the last coefficient. Ignored if `contrast` is specified.
`contrast`	numeric vector specifying the contrast of the linear model coefficients to be tested against the log2-fold-change threshold. Length must equal to the number of columns of `design`. If specified, then takes precedence over `coef`.
`lfc`	numeric scalar specifying the absolute value of the log2-fold change threshold above which differential expression is to be considered.
`null`	character string, choices are `"worst.case"` or `"interval"`. If `"worst.case"`, then the null hypothesis asssumes that the true logFC is on the boundary of the possible values, either at `lfc` or `-lfc`, whichever gives the largest p-value. This gives the most conservative results. If `"interval"`, then the null hypotheses assumes the true logFC to belong to a bounded interval of possible values.

glmTreat implements a test for differential expression relative to a minimum required fold-change threshold. Instead of testing for genes which have log-fold-changes different from zero, it tests whether the log2-fold-change is greater than lfc in absolute value. glmTreat is analogous to the TREAT approach developed by McCarthy and Smyth (2009) for microarrays.

Note that the lfc testing threshold used to define the null hypothesis is not the same as a log2-fold-change cutoff, as the observed log2-fold-change needs to substantially larger than lfc for the gene to be called as significant. In practice, modest values for lfc such as log2(1.1), log2(1.2) or log2(1.5) are usually the most useful. In practice, setting lfc=log2(1.2) or lfc=log2(1.5) will usually cause most differentially expressed genes to have estimated fold-changes of 2-fold or greater, depending on the sample size and precision of the experiment.

Note also that glmTreat constructs test statistics using the unshrunk log2-fold-changes(unshrunk.logFC) rather than the log2-fold-changes that are usually reported (logFC). If no shrinkage has been applied to the log-fold-changes, i.e., the glms were fitted with prior.count=0, then unshrunk.logFC and logFC are the same and the former is omitted from the output object.

glmTreat detects whether glmfit was produced by glmFit or glmQLFit. In the former case, it conducts a modified likelihood ratio test (LRT) against the fold-change threshold. In the latter case, it conducts a quasi-likelihood (QL) F-test against the threshold.

If lfc=0, then glmTreat is equivalent to glmLRT or glmQLFTest, depending on whether likelihood or quasi-likelihood is being used.

glmTreat with positive lfc gives larger p-values than would be obtained with lfc=0. If null="worst.case", then glmTreat conducts a test closely analogous to the treat function in the limma package. This conducts a test if which the null hypothesis puts the true logFC on the boundary of the [-lfc,lfc] interval closest to the observed logFC. If null="interval", then the null hypotheses assumes an interval of possible values for the true logFC. This approach is somewhat less conservative.

Note that, unlike other edgeR functions such as glmLRT and glmQLFTest, glmTreat can only accept a single contrast. If contrast is a matrix with multiple columns, then only the first column will be used.

glmTreat produces an object of class DGELRT with the same components as for glmfit plus the following:

`lfc`	absolute value of the specified log2-fold-change threshold.
`table`	data frame with the same rows as `glmfit` containing the log2-fold-changes, average log2-counts per million and p-values, ready to be displayed by `topTags`.
`comparison`	character string describing the coefficient or the contrast being tested.

The data frame table contains the following columns:

`logFC`	shrunk log2-fold-change of expression between conditions being tested.
`unshrunk.logFC`	unshrunk log2-fold-change of expression between conditions being tested. Exists only when `prior.count` is not equal to 0 for `glmfit`.
`logCPM`	average log2-counts per million, the average taken over all libraries.
`PValue`	p-values.

glmTreat was previously called treatDGE in edgeR versions 3.9.10 and earlier.

Yunshun Chen and Gordon Smyth

McCarthy, D. J., and Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25, 765-771. doi: 10.1093/bioinformatics/btp053

topTags displays results from glmTreat.

treat is the corresponding function in the limma package, designed for use with normally distributed log-expression data rather than for negative binomial counts.

ngenes <- 100
n1 <- 3
n2 <- 3
nlibs <- n1+n2
mu <- 100
phi <- 0.1
group <- c(rep(1,n1), rep(2,n2))
design <- model.matrix(~as.factor(group))

### 4-fold change for the first 5 genes
i <- 1:5
fc <- 4
mu <- matrix(mu, ngenes, nlibs)
mu[i, 1:n1] <- mu[i, 1:n1]*fc

counts <- matrix(rnbinom(ngenes*nlibs, mu=mu, size=1/phi), ngenes, nlibs)
d <- DGEList(counts=counts,lib.size=rep(1e6, nlibs), group=group)

gfit <- glmFit(d, design, dispersion=phi)
tr <- glmTreat(gfit, coef=2, lfc=1)
topTags(tr)