Advanced Rank Product/Rank Sum Analysis

Description

The function performs the Rank Product (or Rank Sum) method to identify differentially expressed genes. It is possible to do either a one-class or two-class analysis. It is also possible to combine data from different studies (e.g. datasets generated by different laboratories)

Usage

1
2
3
RP.advance(data, cl, origin, logged = TRUE, na.rm = TRUE, gene.names = NULL,
plot = FALSE, rand = NULL, calculateProduct = TRUE, MinNumOfValidPairs = NA,
RandomPairs = NA, huge = FALSE, fast = TRUE, tail.time = 0.05)

Arguments

data

the data set that should be analyzed. Every row of this dataset must correspond to a gene

cl

a vector containing the class labels of the samples. In the two class unpaired case, the label of a sample is either 0 (e.g., control group) or 1 (e.g., case group). For one class data, the label for each sample should be 1

origin

a vector containing the origin labels of the samples. The label is the same for samples within one lab and different for samples from different labs.

logged

if "TRUE" data have been previously log transformed. Otherwise it should be set as "FALSE"

na.rm

if "FALSE", the NA value will not be used in computing rank. If "TRUE" (default), the missing values will be replaced by the genewise median of the non-missing values. Gene with a number of missing values greater than "MinNumOfValidPairs" are still not considered in the analysis

gene.names

if "NULL", no gene name will be attached to the outputs, otherwise it contains the vector of gene names

plot

if "TRUE", plot the estimated pfp vs the rank of each gene

rand

if specified, the random number generator will be put in a reproducible state

calculateProduct

if calculateProduct="TRUE" (default) the rank product method is performed. Otherwise the rank sum method is performed

MinNumOfValidPairs

a parameter that indicates the minimum number of NAs accepted per each gene. If it is set to NA (default) the half of the number of replicates is used

RandomPairs

number of random pairs generated in the function, if set to NA (default), the odd integer closer to the square of the number of replicates is used

huge

if "TRUE" not all the outputs are evaluated in order to save space

fast

if "FALSE" the exact p-values for the Rank Sum are evaluated for any size of the dataset. Otherwise (default), if the size of the dataset is too big, only the p-values that can be computed in "tail.time" minutes (starting from the tail) are evaluated with the exact method. The others are estimated with the Gaussian approximation. If calculateProduct="TRUE" this parameter is ignored

tail.time

the time (default 0.05 min) dedicated to evaluate the exact p-values for the Rank Sum.If calculateProduct="TRUE" this parameter is ignored.

Value

A summary of the results obtained by the Rank Product (or Rank Sum) method.

pfp

estimated percentage of false positive predictions (pfp), both considering upregulated an downregulated genes

pval

estimated pvalues per each gene being up- and down-regulated

RPs/RSs

the Rank Product (or Rank Sum) statistics evaluated per each gene

RPrank/RSrank

rank of the Rank Product (or Rank Sum) of each gene in ascending order

Orirank

ranks obtained when considering each possible pairing. In this version of the package, this is not used to compute Rank Product (or Rank Sum), but it is kept for backward compatibility

AveFC

fold changes of average expressions (class1/class2). log fold-change if data has been log transformed, original fold change otherwise

allrank1

fold change of class 1/class 2 under each origin. log fold-change if data has been log transformed, original fold change otherwise

allrank2

fold change of class 2/class 1 under each origin. log fold-change if data has been log transformed, original fold change otherwise

nrep

total number of replicates

groups

vector of labels (as cl)

RandomPairs_ranks

a matrix containing the ranks evaluated for each RandomPair

Author(s)

Francesco Del Carratore, francesco.delcarratore@postgrad.manchester.ac.uk
Andris Janckevics, andris.jankevics@gmail.com

References

Breitling, R., Armengaud, P., Amtmann, A., and Herzyk, P.(2004) Rank Products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments, FEBS Letter, 57383-92

See Also

topGene RP RPadvance plotRP RankProducts RSadvance

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Load the data of Golub et al. (1999). data(golub) 
# contains a 3051x38 gene expression
# matrix called golub, a vector of length called golub.cl 
# that consists of the 38 class labels,
# and a matrix called golub.gnames whose third column 
# contains the gene names.
data(golub)

##For data with single origin
subset <- c(1:4,28:30)
origin <- rep(1,7)
#identify genes 
RP.out <- RP.advance(golub[,subset],golub.cl[subset],
            origin,plot=FALSE,rand=123)
      
#For data from multiple origins
      
# Load the data arab in the package, which contains 
# the expression of 22,081 genes
# of control and treatment group from the experiments 
# indenpently conducted at two 
#laboratories.
data(arab)
arab.origin #1 1 1 1 1 1 2 2 2 2
arab.cl #0 0 0 1 1 1 0 0 1 1
RP.adv.out <- RP.advance(arab,arab.cl,arab.origin,
                gene.names=arab.gnames,logged=TRUE,rand=123)

attributes(RP.adv.out)
head(RP.adv.out$pfp)
head(RP.adv.out$RPs)
head(RP.adv.out$AveFC)
      
     
     
#Suppose we want to check the consistence of the data 
#sets generated in two different 
#labs. For example, we would look for genes that were \
# measured to be up-regulated in 
#class 2 at lab 1, but down-regulated in class 2 at lab 2.\
data(arab)
arab.cl2 <- arab.cl

arab.cl2[arab.cl==0 &arab.origin==2] <- 1

arab.cl2[arab.cl==1 &arab.origin==2] <- 0

arab.cl2
##[1] 0 0 0 1 1 1 1 1 0 0


#look for genes differentially expressed
#between hypothetical class 1 and 2
arab.sub=arab[1:500,] ##using subset for fast computation
arab.gnames.sub=arab.gnames[1:500]
Rsum.adv.out <- RP.advance(arab.sub,arab.cl2,arab.origin,calculateProduct
                =FALSE,logged=TRUE,gene.names=arab.gnames.sub,rand=123)

attributes(Rsum.adv.out)