compareLists: Compare Ordered Lists with Weighted Overlap Score

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/compareLists.R

Description

The two orderings received as parameters are compared using the weighted overlap score and compared with a random distribution of that score (yielding an empirical p-value).

Usage

1
2
3
4
compareLists(ID.List1, ID.List2, mapping = NULL, 
             two.sided=TRUE, B = 1000, alphas = NULL, 
             invar.q = 0.5, min.weight = 1e-5,
             no.reverse=FALSE)

Arguments

ID.List1

first ordered list of identifiers to be compared.

ID.List2

second ordered list to be compared, must have the same length as ID.List1.

mapping

maps identifiers between the two lists. This is a matrix with two columns. All items in ID.List1 must match to exactly one entry of column 1 of the mapping, each element in ID.List2 must match exactly one element in column 2 of the mapping. If mapping is NULL, the two lists are expected to contain the same identifiers and there must be a one-to-one relationship between the two.

two.sided

whether the score is to be computed considering both ends of the list, or just the top members.

B

the number of permutations used to estimate empirical p-values.

alphas

a set of alpha candidates to be evaluated. If set to NULL, alphas are determined such that reasonable maximal ranks to be considered result.

invar.q

quantile of genes expected to be invariant. These are not used during shuffling, since they are expected to stay away from the ends of the lists, even when the data is perturbed to generate the NULL distribution. The default of 0.5 is reasonable for whole-genome gene expression analysis, but must be reconsidered when the compared lists are deduced from other sources.

min.weight

the minimal weight to be considered.

no.reverse

skip computing scores for reversed second list.

Details

The two lists received as arguments are matched against each other according to the given mapping. The comparison is performed from both ends by default. Permutations of lists are used to generate random scores and compute empirical p-values. The evaluation is also performed for the case the lists should be reversed. From the resulting output, the set of overlapping list identifiers can be extracted using function getOverlap.

Value

An object of class listComparison is returned. It contains the following list elements:

n

the length of the lists

call

the input parameters

nn

the maximal number of genes corresponding to the alphas and the minimal weight

scores

scores for the straight list comparisons

revScores

scores for the reversed list comparison

pvalues

p-values for the straight list comparison

revPvalues

p-values for the reversed list comparison

overlap

number of overlapping identifiers per rank in straight comparison

revOverlap

number of overlapping identifiers per rank in reversed comparison

randomScores

random scores per weighting parameter

ID.List1

same as input ID.List1

ID.List2

same as input ID.List2

There are print and plot methods for listComparison objects. The plot method takes a parameter which to specify whether "overlap" or "density" is to be drawn.

Author(s)

Claudio Lottaz, Stefanie Scheid

References

Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in Journal of Bioinformatics and Computational Biology.

See Also

OrderedList, getOverlap

Examples

1
2
3
4
5
6
7
### Compare two artificial lists with some overlap
data(OL.data)
list1 <- as.character(OL.data$map$prostate)
list2 <- c(sample(list1[1:500]),sample(list1[501:1000]))
x <- compareLists(list1,list2)
x
getOverlap(x)

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: twilight
Loading required package: splines
  Simulating random scores...
  0%.......:.........:.........:.........:......100%
  --------------------------------------------------
List comparison
  Assessing similarity of     : top and bottom ranks
  Length of lists             : 1000
  Quantile of invariant genes : 0.5
  Number of random samples    : 1000
--------------------------------------
      Genes      Scores p.values Rev.Scores Rev.p.values
0.115   100    1.073385    0.961   0.000000            1
0.077   150    7.490459    0.949   0.000000            1
0.058   200   24.187855    0.939   0.000000            1
0.038   300  104.685686    0.933   0.000000            1
0.029   400  272.780774    0.921   0.000000            1
0.023   500  559.029968    0.920   0.000000            1
0.015   750 1998.820224    0.923   7.062765            1
List comparison
  Assessing similarity of               : top and bottom ranks
  Length of lists                       : 1000
  Number of random samples              : 1000
----------------------------------------------------------
  Lists are more alike in direct order
  Chosen regularization parameter       : alpha = 0.023 ( 500 genes)
  Weighted overlap score                : 559.03
  Significance of similarity            : p-value = 0.92
  Score percentage for common entries   : 95
  Entries contributing score percentage : 313

OrderedList documentation built on Nov. 8, 2020, 5:41 p.m.