Assess the overlap between two or three lists.

Description

Assess the overlap between two or three lists, e.g. ChIP-Seq peaks vs. genes selected from a microarray, or peaks obtained in different experiments.

Usage

1
listOverlap(list1, list2, list3, univ, ...)

Arguments

list1

Vector with elements in the first list. This can either be a character vector indicating the element names, or a named factor vector indicating some classification for the elements in the first list.

list2

Vector with elements in the second list. This should be a character vector indicating the element names.

list3

Vector with elements in the third list. This should be a character vector indicating the element names. The overlap assesment method used depends on whether this argument is specified or not. See details.

univ

character vector indicating the universe of all elements from which list1 and list2 were obtained. The overlap assessment depends on whether this argument is specified or not. See details.

...

Further arguments to be passed on to chisq.test in 2 list overlapping.

Details

For signature(list1='character', list2='character', list3='missing', univ='character') the overlap is assessed with respect to the universe of all possible elements univ. That is, we count the number of elements that are common to list1 and list2, those appearing only in either list1 or list2, and those not appearing in either (but appearing in univ). A typical example: list1 contains names of genes with a peak in ChIP-Seq experiment 1, list2 names of genes with a peak in ChIP-Seq experiment 2, and univ the names of all genes in the organism.

For signature(list1='character', list2='character', list='character', univ='character') the overlap is assessed by fitting and anova comparison of linear models. This is done to test whether 3-way overlap is significant with respect to the universe of all possible elements univ when compared to a model considering just the combination of 2-way overlapping. A typical example: list1, list2 and list3 contain names of genes with peaks in three different ChIP-Seq experiments, and univ the names of all genes in the organism.

For signature(list1='factor', list2='character', univ='missing') the distribution of list1 is compared between elements appearing and not appearing in list2. A typical example: list1 indicates the differential expression status for a number of genes, and list2 contains the names of the genes which had a peak in a ChIP-Seq experiment.

Value

For comparison of 2 lists, an htest object from a chi-square test that evaluates if the two lists are statistically independent from each other. This is a named list: the observed overlap is stored in observed and the P-value in p.value.

For 3 list comparison, a list object containing the occurrence and frequency tables (xtab, ftable), the fitted linear models (glm1, glm2), and the anova P-value (pvalue).

Methods

signature(list1 = "character", list2 = "character", list3 = "character", univ = "character")

Studies 3-way associations.

signature(list1 = "character", list2 = "character", list3 = "missing", univ = "character")

Studies bivariate associations.

signature(list1 = "factor", list2 = "character", list3 = "missing", univ = "missing")

Studies bivariate associations.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#Overlap between diff expression and chip-seq peaks
deStatus <- factor(c(0,0,0,0,1,1,1))
names(deStatus) <- paste('Gene',1:7)
peaks <- c('Gene 6','Gene 7')
ans <- listOverlap(list1=deStatus,list2=peaks)
ans$observed
ans$p.value

#Overlap between peaks obtained from two different experiments
peaks2 <- c('Gene 1','Gene 2','Gene 7')
univ <- paste('Gene',1:7)
ans <- listOverlap(list1=peaks,list2=peaks2,univ=univ)
ans$observed
ans$p.value

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.