rankProductDiffExpress: Rank Product across multiple Samples

Use the Rank Product method to find differentially expressed genes, or prioritize the order a set of genes.


rankProduct(rankM, nSimulations = 500)

rankProductDiffExpress(fnames, groupSet, targetGroup = groupSet[1], 
		geneColumn = "GENE_ID", intensityColumn = "INTENSITY", 
		productColumn = "PRODUCT", offset = 0, keepIntergenics = FALSE, 
		average.FUN = logmean, poolSet = rep(1, length(fnames)), 
		nSimulations = 500, missingGenes = c("drop", "fill"))



numeric matrix of gene ranks, with GeneIDs as the rownames and SampleIDs as the column names


character vector of full pathnames to existing transcriptome files


character vector of GroupIDs or conditions, to categorize the transcripts


the one GroupID to be the chosen subset, to compare all other groups against. This is the group that is being tested for up-regulation.


column name of the column of GeneIDs


column name of the column of intensity values


column name of the column that has gene product descriptions


a linear offset to add to all intensity values to prevent divide by zero and/or extreme fold change ratios


logical, explicity keep the non-genes, or drop them from consideration


the averaging function for combining gene intensities within subset groups, gene ranks, and gene RP values


numeric vector of sample pools or tiers. For restricting 2-sample DE tests to samples from comparable tiers. See details.


number of simulations of random permutations of the data, for calculating false positives rates.


method for dealing with genes that are not present in every transcript file. Either drop entire gene rows, or fill in with minimum observed intensity.


This function implements the Rank Product algorithm of Breitling, et.al. By performing all possible 2-sample DE comparisons and ranking genes by fold change, this calculates a family of rank positions for each gene. Turning those ranks into probabilities of differential expression, the algorithm assigns a measure called Rank Product (RP), as the likelihood a gene could be that high in rank across that many DE comparisons.

For the simple case of rankProduct, given a matrix of ranks, the algorithm just measures RP and estimates the false positive rates.

By default, each sample is compared to all other samples that are not from its group. If more restrictions are warranted, the poolSet argument can be used to assign a pool or tier to each sample; whereby only samples from the same pool but coming from different groups go forward into the 2-sample tests.


For rankProduct, a data frame with RP values, average ranks, and false positive rates for each gene are returned, in the same row order as the input matrix.

For rankProductDiffExpress, a data frame of consensus gene differential expression, sorted by RP value, with columns:


the genes, sorted from most up-regulated to most down-regulated


the gene product descriptions


the average fold change for each gene


the Rank Product value. See calcRP


the average rank position over all 2-sample DE tests, for each gene


the average gene intensity over all samples in group targetGroup


the average gene intensity over all samples in the other groups


the expected number of genes to have an RP value this good, by chance


the rate of false positive DE genes, given this RP value


Typically, this function is called once for each group, to get all possible DE comparisons between the various groups. While the function explicitly measures up-regulation, by reversing the order of the rows of the result, you get the answer for down-regulation.


Bob Morrison


Rainer Breitling, et.al. FEBS Letters 573 (2004)

