OrderedList: Detecting Similarities of Two Microarray Studies

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/OrderedList.R

Description

Function OrderedList aims for the comparison of comparisons: given two expression studies with one ranked (ordered) list of genes each, we might observe considerable overlap among the top-scoring genes. OrderedList quantifies this overlap by computing a weighted similarity score, where the top-ranking genes contribute more to the score than the genes further down the list. The final list of overlapping genes consists of those probes that contribute a certain percentage to the overall similarity score.

Usage

1
2
OrderedList(eset, B = 1000, test = "z", beta = 1, percent = 0.95, 
            verbose = TRUE, alpha=NULL, min.weight=1e-5, empirical=FALSE)

Arguments

eset

Expression set containing the two studies of interest. Use prepareData to generate eset.

B

Number of internal sub-samples needed to optimize alpha.

test

String, one of 'fc' (log ratio = log fold change), 't' (t-test with equal variances) or 'z' (t-test with regularized variances). The z-statistic is implemented as described in Efron et al. (2001).

beta

Either 1 or 0.5. In a comparison where the class labels of the studies match, we set beta=1. For example, in each single study the first class relates to bad prognosis while the second class relates to good prognosis. If a matching is not possible, we set beta=0.5. For example, we compare a study with good/bad prognosis classes to a study, in which the classes are two types of cancer tissues.

percent

The final list of overlapping genes consists of those probes that contribute a certain percentage to the overall similarity score. Default is percent=0.95. To get the full list of genes, set percent=1.

verbose

Logical value for message printing.

alpha

A vector of weighting parameters. If set to NULL (the default), parameters are computed such that top 100 to the top 2500 ranks receive weights above min.weight.

min.weight

The minimal weight to be taken into account while computing scores.

empirical

If TRUE, empirical confidence intervals will be computed by randomly permuting the class labels of each study. Otherwise, a hypergeometric distribution is used. Confidence intervals appear when using plot.OrderedList.

Details

In short, the similarity measure is computed as follows: Based on two-sample test statistics like the t-test, genes within each study are ranked from most up-regulated down to most down-regulated. Thus we have one ordered list per study. Now for each rank going both from top (up-regulated end) and from bottom (down-regulated end) we count the number of overlapping genes. The total overlap A_n for rank n is defined as:

A_n = O_n (G_1,G_2) + O_n(f(G_1),f(G_2))

where G_1 and G_2 are the two ordered list, f(G_1) and f(G_2) are the two flipped lists with the down-regulated genes on top and O_n is the size of the overlap of its two arguments. A preliminary version of the weighted overlap over all ranks n is then given as:

T_α(G_1,G_2) = ∑_n \exp{-α n} A_n.

The final similarity score includes the case that we cannot match the classes in each study exactly and thus do not know whether up-regulation in one list corresponds to up- or down-regulation in the other list. Here parameter β comes into play:

S_α(G_1,G_2) = \max{ β T_α(G_1,G_2), (1-β) T_α (G_1,f(G_2)) }.

Parameter β is set by the user but parameter α has to be tuned in a simulation using sub-samples and permutations of the original class labels.

Value

Returns an object of class OrderedList, which consists of a list with entries:

n

Total number of genes.

label

The concatenated study labels as provided by eset.

p

The p-value specifying the significance of the similarity.

intersect

Vector with sorted probe IDs of the overlapping genes, which contribute percent to the overall similarity score.

alpha

The optimal regularization parameter alpha.

direction

Numerical value. Returns '1' if the similarity score is higher for the originally ordered lists and '-1' if the score is higher for the comparison of one original to one flipped list. Of special interest if beta=0.5.

scores

Matrix of observed test scores with genes in rows and studies in columns.

sim.scores

List with four elements with output of the resampling with optimal alpha. SIM.observed: The observed similarity sore. SIM.alternative: Vector of observed similarity scores simulated using sub-sampling within the distinct classes of each study. SIM.random: Vector of random similarity scores simulated by randomly permuting the class labels of each study. subSample: TRUE to indicate that sub-sampling was used.

pauc

Vector with pAUC-scores for each candidate of the regularization parameter α. The maximal pAUC-score defines the optimal α. See also plot.OrderedList.

call

List with some of the input parameters.

empirical

List with confidence interval values. Is NULL if empirical=FALSE.

Author(s)

Xinan Yang, Claudio Lottaz, Stefanie Scheid

References

Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in Journal of Bioinformatics and Computational Biology.

Efron B, Tibshirani R, Storey JD, and Tusher V (2001): Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Society 96, 1151–1160.

See Also

prepareData, OL.data, OL.result, plot.OrderedList, print.OrderedList, compareLists

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
### Let's compare the two example studies.
### The first entries of 'out' both relate to bad prognosis.
### Hence the class labels match between the two studies
### and we can use 'OrderedList' with default 'beta=1'.
data(OL.data)
a <- prepareData(
                 list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
                 list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
		 mapping=OL.data$map
                 )
## Not run: 
OL.result <- OrderedList(a)

## End(Not run)

### The same comparison was done beforehand.
data(OL.result)
OL.result
plot(OL.result)

OrderedList documentation built on Nov. 8, 2020, 5:41 p.m.