OrderedList: Detecting Similarities of Two Microarray Studies
In OrderedList: Similarities of Ordered Gene Lists

Description Usage Arguments Details Value Author(s) References See Also Examples

Function OrderedList aims for the comparison of comparisons: given two expression studies with one ranked (ordered) list of genes each, we might observe considerable overlap among the top-scoring genes. OrderedList quantifies this overlap by computing a weighted similarity score, where the top-ranking genes contribute more to the score than the genes further down the list. The final list of overlapping genes consists of those probes that contribute a certain percentage to the overall similarity score.

1 2	OrderedList(eset, B = 1000, test = "z", beta = 1, percent = 0.95, verbose = TRUE, alpha=NULL, min.weight=1e-5, empirical=FALSE)

`eset`	Expression set containing the two studies of interest. Use `prepareData` to generate `eset`.
`B`	Number of internal sub-samples needed to optimize alpha.
`test`	String, one of 'fc' (log ratio = log fold change), 't' (t-test with equal variances) or 'z' (t-test with regularized variances). The z-statistic is implemented as described in Efron et al. (2001).
`beta`	Either 1 or 0.5. In a comparison where the class labels of the studies match, we set `beta=1`. For example, in each single study the first class relates to bad prognosis while the second class relates to good prognosis. If a matching is not possible, we set `beta=0.5`. For example, we compare a study with good/bad prognosis classes to a study, in which the classes are two types of cancer tissues.
`percent`	The final list of overlapping genes consists of those probes that contribute a certain percentage to the overall similarity score. Default is `percent=0.95`. To get the full list of genes, set `percent=1`.
`verbose`	Logical value for message printing.
`alpha`	A vector of weighting parameters. If set to NULL (the default), parameters are computed such that top 100 to the top 2500 ranks receive weights above `min.weight`.
`min.weight`	The minimal weight to be taken into account while computing scores.
`empirical`	If `TRUE`, empirical confidence intervals will be computed by randomly permuting the class labels of each study. Otherwise, a hypergeometric distribution is used. Confidence intervals appear when using `plot.OrderedList`.

In short, the similarity measure is computed as follows: Based on two-sample test statistics like the t-test, genes within each study are ranked from most up-regulated down to most down-regulated. Thus we have one ordered list per study. Now for each rank going both from top (up-regulated end) and from bottom (down-regulated end) we count the number of overlapping genes. The total overlap A_n for rank n is defined as:

A_n = O_n (G_1,G_2) + O_n(f(G_1),f(G_2))

where G_1 and G_2 are the two ordered list, f(G_1) and f(G_2) are the two flipped lists with the down-regulated genes on top and O_n is the size of the overlap of its two arguments. A preliminary version of the weighted overlap over all ranks n is then given as:

T_α(G_1,G_2) = ∑_n \exp{-α n} A_n.

The final similarity score includes the case that we cannot match the classes in each study exactly and thus do not know whether up-regulation in one list corresponds to up- or down-regulation in the other list. Here parameter β comes into play:

S_α(G_1,G_2) = \max{ β T_α(G_1,G_2), (1-β) T_α (G_1,f(G_2)) }.

Parameter β is set by the user but parameter α has to be tuned in a simulation using sub-samples and permutations of the original class labels.

Returns an object of class OrderedList, which consists of a list with entries:

`n`	Total number of genes.
`label`	The concatenated study labels as provided by `eset`.
`p`	The p-value specifying the significance of the similarity.
`intersect`	Vector with sorted probe IDs of the overlapping genes, which contribute `percent` to the overall similarity score.
`alpha`	The optimal regularization parameter alpha.
`direction`	Numerical value. Returns '1' if the similarity score is higher for the originally ordered lists and '-1' if the score is higher for the comparison of one original to one flipped list. Of special interest if `beta=0.5`.
`scores`	Matrix of observed test scores with genes in rows and studies in columns.
`sim.scores`	List with four elements with output of the resampling with optimal `alpha`. `SIM.observed`: The observed similarity sore. `SIM.alternative`: Vector of observed similarity scores simulated using sub-sampling within the distinct classes of each study. `SIM.random`: Vector of random similarity scores simulated by randomly permuting the class labels of each study. `subSample`: `TRUE` to indicate that sub-sampling was used.
`pauc`	Vector with pAUC-scores for each candidate of the regularization parameter α. The maximal pAUC-score defines the optimal α. See also `plot.OrderedList`.
`call`	List with some of the input parameters.
`empirical`	List with confidence interval values. Is `NULL` if `empirical=FALSE`.

Xinan Yang, Claudio Lottaz, Stefanie Scheid

Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in Journal of Bioinformatics and Computational Biology.

Efron B, Tibshirani R, Storey JD, and Tusher V (2001): Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Society 96, 1151–1160.

prepareData, OL.data, OL.result, plot.OrderedList, print.OrderedList, compareLists

### Let's compare the two example studies.
### The first entries of 'out' both relate to bad prognosis.
### Hence the class labels match between the two studies
### and we can use 'OrderedList' with default 'beta=1'.
data(OL.data)
a <- prepareData(
                 list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
                 list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
		 mapping=OL.data$map
                 )
## Not run: 
OL.result <- OrderedList(a)

## End(Not run)

### The same comparison was done beforehand.
data(OL.result)
OL.result
plot(OL.result)