edgeR_fork: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Extracts the most differentially expressed genes (or sequence tags) from a test object, ranked either by p-value or by absolute log-fold-change.

1	topTags(object, n = 10, adjust.method = "BH", sort.by = "PValue", p.value = 1)

`object`	a `DGEExact` or `DGELRT` object containing test statistics and p-values. Usually created by `exactTest`, `glmLRT`, `glmTreat` or `glmQLFTest`.
`n`	integer, maximum number of genes/tags to return.
`adjust.method`	character string specifying the method used to adjust p-values for multiple testing. See `p.adjust` for possible values.
`sort.by`	character string specifying the sort method. Possibilities are `"PValue"` for p-value, `"logFC"` for absolute log-fold change or `"none"` for no sorting.
`p.value`	numeric cutoff value for adjusted p-values. Only tags with adjusted p-values equal or lower than specified are returned.

This function is closely analogous to the topTable function in the limma package. It accepts a test statistic object created by any of the edgeR functions exactTest, glmLRT, glmTreat or glmQLFTest and extracts a readable data.frame of the most differentially expressed genes. The data.frame collates the annotation and differential expression statistics for the top genes. The data.frame is wrapped in a TopTags output object that records the test statistic used and the multiple testing adjustment method.

TopTags objects will return dimensions and hence functions such as dim, nrow or ncol are defined on them. TopTags objects also have a show method so that printing produces a compact summary of their contents.

topTags permits ranking by fold-change but the authors do not recommend fold-change ranking or fold-change cutoffs for routine RNA-seq analysis. The p-value ranking is intended to more biologically meaningful, especially if the p-values were computed using glmTreat.

An object of class TopTags, which is a list-based class with the following components:

table

a data.frame containing differential expression results for the top genes in sorted order. The number of rows is the smaller of n and the number of genes with adjusted p-value less than or equal to p.value. The data.frame includes all the annotation columns from object$genes and all statistic columns from object$table plus one of:

`FDR`:	false discovery rate (only when `adjust.method` is `"BH"`, `"BY"` or `"fdr"`)
`FWER`:	family-wise error rate (only when `adjust.method` is `"holm"`, `"hochberg"`, `"hommel"` or `"bonferroni"`).

adjust.method

character string specifying the method used to adjust p-values for multiple testing, same as input argument.

comparison

character vector giving the names of the two groups being compared (for DGEExact objects) or the glm contrast being tested (for DGELRT objects).

test

character string stating the name of the test.

The terms ‘tag’ and ‘gene’ are used synonymously on this page and refer to the rows of object. In general, the rows might be genes, sequence tags, transcripts, exons or whatever type of genomic feature is appropriate for the analysis at hand.

Mark Robinson, Davis McCarthy, Yunshun Chen, Gordon Smyth

Chen Y, Lun ATL, and Smyth, GK (2016). From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438. http://f1000research.com/articles/5-1438

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. doi: 10.1093/nar/gks042

Robinson MD, Smyth GK (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321-332.

Robinson MD, Smyth GK (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, 2881-2887.

exactTest, glmLRT, glmTreat, glmQLFTest, dim.TopTags, p.adjust.

# generate raw counts from NB, create list object
y <- matrix(rnbinom(80,size=1,mu=10),nrow=20)
d <- DGEList(counts=y,group=rep(1:2,each=2),lib.size=rep(c(1000:1001),2))
rownames(d$counts) <- paste("gene",1:nrow(d$counts),sep=".")

# estimate common dispersion and find differences in expression
# here we demonstrate the 'exact' methods, but the use of topTags is
# the same for a GLM analysis
d <- estimateCommonDisp(d)
de <- exactTest(d)

# look at top 10
topTags(de)
# Can specify how many genes to view
tp <- topTags(de, n=15)
# Here we view top 15
tp
# Or order by fold change instead
topTags(de,sort.by="logFC")