toptable: Table of Top Genes from Linear Model Fit
In richierocks/limma2: Linear Models for Microarray Data

Description Usage Arguments Details Value Author(s) See Also Examples

Extract a table of the top-ranked genes from a linear model fit.

topTable(fit, coef=NULL, number=10, genelist=fit$genes, adjust.method="BH",
         sort.by="B", resort.by=NULL, p.value=1, lfc=0, confint=FALSE)
toptable(fit, coef=1, number=10, genelist=NULL, A=NULL, eb=NULL, adjust.method="BH",
         sort.by="B", resort.by=NULL, p.value=1, lfc=0, confint=FALSE, ...)
topTableF(fit, number=10, genelist=fit$genes, adjust.method="BH",
         sort.by="F", p.value=1, lfc=0)
topTreat(fit, coef=1, number=10, genelist=fit$genes, adjust.method="BH",
         sort.by="p", resort.by=NULL, p.value=1)

`fit`	list containing a linear model fit produced by `lmFit`, `lm.series`, `gls.series` or `mrlm`. For `topTable`, `fit` should be an object of class `MArrayLM` as produced by `lmFit` and `eBayes`.
`coef`	column number or column name specifying which coefficient or contrast of the linear model is of interest. For `topTable`, can also be a vector of column subscripts, in which case the gene ranking is by F-statistic for that set of contrasts.
`number`	maximum number of genes to list
`genelist`	data frame or character vector containing gene information. For `topTable` only, this defaults to `fit$genes`.
`A`	matrix of A-values or vector of average A-values. For `topTable` only, this defaults to `fit$Amean`.
`eb`	output list from `ebayes(fit)`. If `NULL`, this will be automatically generated.
`adjust.method`	method used to adjust the p-values for multiple testing. Options, in increasing conservatism, include `"none"`, `"BH"`, `"BY"` and `"holm"`. See `p.adjust` for the complete list of options. A `NULL` value will result in the default adjustment method, which is `"BH"`.
`sort.by`	character string specifying statistic to rank genes by. Possible values for `topTable` and `toptable` are `"logFC"`, `"AveExpr"`, `"t"`, `"P"`, `"p"`, `"B"` or `"none"`. (Permitted synonyms are `"M"` for `"logFC"`, `"A"` or `"Amean"` for `"AveExpr"`, `"T"` for `"t"` and `"p"` for `"P"`.) Possibilities for `topTableF` are `"F"` or `"none"`. Possibilities for `topTreat` are as for `topTable` except for `"B"`.
`resort.by`	character string specifying statistic to sort the selected genes by in the output data.frame. Possibilities are the same as for `sort.by`.
`p.value`	cutoff value for adjusted p-values. Only genes with lower p-values are listed.
`lfc`	minimum absolute log2-fold-change required. `topTable` and `topTableF` include only genes with (at least one) absolute log-fold-changes greater than `lfc`. `topTreat` does not remove genes but ranks genes by evidence that their log-fold-change exceeds `lfc`.
`confint`	logical, should 95% confidence intervals be output for `logFC`?
`...`	any other arguments are passed to `ebayes` if `eb` is `NULL`

toptable is an earlier interface and is retained only for backward compatibility.

These functions summarize the linear model fit object produced by lmFit, lm.series, gls.series or mrlm by selecting the top-ranked genes for any given contrast. topTable and topTableF assume that the linear model fit has already been processed by eBayes. topTreat assumes that the fit has been processed by treat.

The p-values for the coefficient/contrast of interest are adjusted for multiple testing by a call to p.adjust. The "BH" method, which controls the expected false discovery rate (FDR) below the specified value, is the default adjustment method because it is the most likely to be appropriate for microarray studies. Note that the adjusted p-values from this method are bounds on the FDR rather than p-values in the usual sense. Because they relate to FDRs rather than rejection probabilities, they are sometimes called q-values. See help("p.adjust") for more information.

Note, if there is no good evidence for differential expression in the experiment, that it is quite possible for all the adjusted p-values to be large, even for all of them to be equal to one. It is quite possible for all the adjusted p-values to be equal to one if the smallest p-value is no smaller than 1/ngenes where ngenes is the number of genes with non-missing p-values.

The sort.by argument specifies the criterion used to select the top genes. The choices are: "logFC" to sort by the (absolute) coefficient representing the log-fold-change; "A" to sort by average expression level (over all arrays) in descending order; "T" or "t" for absolute t-statistic; "P" or "p" for p-values; or "B" for the lods or B-statistic.

Normally the genes appear in order of selection in the output table. If a different order is wanted, then the resort.by argument may be useful. For example, topTable(fit, sort.by="B", resort.by="logFC") selects the top genes according to log-odds of differential expression and then orders the selected genes by log-ratio in decreasing order. Or topTable(fit, sort.by="logFC", resort.by="logFC") would select the genes by absolute log-fold-change and then sort them from most positive to most negative.

topTableF ranks genes on the basis of moderated F-statistics for one or more coefficients. If topTable is called and coef has two or more elements, then the specified columns will be extracted from fit and topTableF called on the result. topTable with coef=NULL is the same as topTableF, unless the fitted model fit has only one column.

Toptable output for all probes in original (unsorted) order can be obtained by topTable(fit,sort="none",n=Inf). However write.fit or write may be preferable if the intention is to write the results to a file. A related method is as.data.frame(fit) which coerces an MArrayLM object to a data.frame.

By default number probes are listed. Alternatively, by specifying p.value and number=Inf, all genes with adjusted p-values below a specified value can be listed.

The argument lfc gives the ability to filter genes by log-fold change. This argument is not available for topTreat because treat already handles fold-change thresholding in a more sophisticated way.

A dataframe with a row for the number top genes and the following columns:

`genelist`	one or more columns of probe annotation, if genelist was included as input
`logFC`	estimate of the log2-fold-change corresponding to the effect or contrast (for `topTableF` there may be several columns of log-fold-changes)
`CI.025`	left limit of confidence interval for `logFC` (if `confint=TRUE`)
`CI.975`	right limit of confidence interval for `logFC` (if `confint=TRUE`)
`AveExpr`	average log2-expression for the probe over all arrays and channels, same as `Amean` in the `MarrayLM` object
`t`	moderated t-statistic (omitted for `topTableF`)
`F`	moderated F-statistic (omitted for `topTable` unless more than one coef is specified)
`P.Value`	raw p-value
`adj.P.Value`	adjusted p-value or q-value
`B`	log-odds that the gene is differentially expressed (omitted for `topTreat`)

If fit had unique rownames, then the row.names of the above data.frame are the same in sorted order. Otherwise, the row.names of the data.frame indicate the row number in fit. If fit had duplicated row names, then these are preserved in the ID column of the data.frame, or in ID0 if genelist already contained an ID column.

Gordon Smyth

An overview of linear model and testing functions is given in 06.LinearModels. See also p.adjust in the stats package.