Description Usage Arguments Details Value Author(s) References See Also Examples
Function OrderedList
aims for the comparison of
comparisons: given two expression studies with one ranked (ordered)
list of genes each, we might observe considerable overlap among the
top-scoring genes. OrderedList
quantifies this overlap by
computing a weighted similarity score, where the top-ranking genes
contribute more to the score than the genes further down the list. The
final list of overlapping genes consists of those probes that
contribute a certain percentage to the overall similarity score.
1 2 | OrderedList(eset, B = 1000, test = "z", beta = 1, percent = 0.95,
verbose = TRUE, alpha=NULL, min.weight=1e-5, empirical=FALSE)
|
eset |
Expression set containing the two studies of interest. Use |
B |
Number of internal sub-samples needed to optimize alpha. |
test |
String, one of 'fc' (log ratio = log fold change), 't' (t-test with equal variances) or 'z' (t-test with regularized variances). The z-statistic is implemented as described in Efron et al. (2001). |
beta |
Either 1 or 0.5. In a comparison where the class labels of the studies match, we set |
percent |
The final list of overlapping genes consists of those probes that contribute a certain percentage to the overall similarity score. Default is |
verbose |
Logical value for message printing. |
alpha |
A vector of weighting parameters. If set to NULL (the default),
parameters are computed such that top 100 to the top 2500 ranks receive
weights above |
min.weight |
The minimal weight to be taken into account while computing scores. |
empirical |
If |
In short, the similarity measure is computed as follows: Based on two-sample test statistics like the t-test, genes within each study are ranked from most up-regulated down to most down-regulated. Thus we have one ordered list per study. Now for each rank going both from top (up-regulated end) and from bottom (down-regulated end) we count the number of overlapping genes. The total overlap A_n for rank n is defined as:
A_n = O_n (G_1,G_2) + O_n(f(G_1),f(G_2))
where G_1 and G_2 are the two ordered list, f(G_1) and f(G_2) are the two flipped lists with the down-regulated genes on top and O_n is the size of the overlap of its two arguments. A preliminary version of the weighted overlap over all ranks n is then given as:
T_α(G_1,G_2) = ∑_n \exp{-α n} A_n.
The final similarity score includes the case that we cannot match the classes in each study exactly and thus do not know whether up-regulation in one list corresponds to up- or down-regulation in the other list. Here parameter β comes into play:
S_α(G_1,G_2) = \max{ β T_α(G_1,G_2), (1-β) T_α (G_1,f(G_2)) }.
Parameter β is set by the user but parameter α has to be tuned in a simulation using sub-samples and permutations of the original class labels.
Returns an object of class OrderedList
, which consists of a list with entries:
n |
Total number of genes. |
label |
The concatenated study labels as provided by |
p |
The p-value specifying the significance of the similarity. |
intersect |
Vector with sorted probe IDs of the overlapping genes, which contribute |
alpha |
The optimal regularization parameter alpha. |
direction |
Numerical value. Returns '1' if the similarity score is higher for the originally ordered lists and '-1' if the score is higher for the comparison of one original to one flipped list. Of special interest if |
scores |
Matrix of observed test scores with genes in rows and studies in columns. |
sim.scores |
List with four elements with output of the resampling with optimal |
pauc |
Vector with pAUC-scores for each candidate of the regularization parameter α. The maximal pAUC-score defines the optimal α. See also |
call |
List with some of the input parameters. |
empirical |
List with confidence interval values. Is |
Xinan Yang, Claudio Lottaz, Stefanie Scheid
Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in Journal of Bioinformatics and Computational Biology.
Efron B, Tibshirani R, Storey JD, and Tusher V (2001): Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Society 96, 1151–1160.
prepareData
, OL.data
, OL.result
, plot.OrderedList
, print.OrderedList
, compareLists
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ### Let's compare the two example studies.
### The first entries of 'out' both relate to bad prognosis.
### Hence the class labels match between the two studies
### and we can use 'OrderedList' with default 'beta=1'.
data(OL.data)
a <- prepareData(
list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
mapping=OL.data$map
)
## Not run:
OL.result <- OrderedList(a)
## End(Not run)
### The same comparison was done beforehand.
data(OL.result)
OL.result
plot(OL.result)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.