overLapper: Set Intersect and Venn Diagram Functions

View source: R/overLapper.R

overLapperR Documentation

Set Intersect and Venn Diagram Functions


Function for computing Venn intersects or standard intersects among large numbers of label sets provided as list of vectors. The resulting intersect objects can be used for plotting 2-5 way Venn diagrams or intersect bar plots using the functions vennPlot or olBarplot, respectively. The overLapper function scales to 2-20 or more label vectors for Venn intersect calculations and to much larger sample numbers for standard intersects. The different intersect types are explained below under the definition of the type argument. The upper Venn limit around 20 label sets is unavoidable because the complexity of Venn intersects increases exponentially with the label set number n according to this relationship: 2^n - 1. The current implementation of the plotting function vennPlot supports Venn diagrams for 2-5 label sets. To visually analyze larger numbers of label sets, a variety of intersect methods are introduced in the olBarplot help file. These methods are much more scalable than Venn diagrams, but lack their restrictive intersect logic.


overLapper(setlist, complexity = "default", sep = "_", cleanup = FALSE, keepdups = FALSE, type)



Object of class list where each list component stores a label set as vector and the name of each label set is stored in the name slot of each list component. The names are used for naming the label sets in all downstream analysis steps and plots.


Complexity level of intersects specified as integer vector. For Venn intersects it needs to be assigned 1:length(setlist) (default). If complexity=2 the function returns all pairwise intersects.


Character used to separate set labels.


If set to TRUE then all characters of the label sets are set to upper case, and leading and trailing spaces are removed. The default cleanup=FALSE omits this step.


By default all duplicates are removed from the label sets. The setting keepdups=TRUE will retain duplicates by appending a counter to each entry.


With the default setting type="vennsets" the overLapper function computes the typical Venn intersects for the label sets provided under setlist. With the setting type="intersects" the function will compute pairwise intersects (not compatible with Venn diagrams). Venn intersects follow the typical 'only in' intersect logic of Venn comparisons, such as: labels present only in set A, labels present only in the intersect of A & B, etc. Due to this restrictive intersect logic, the combined Venn sets contain no duplicates. In contrast to this, regular intersects follow this logic: labels present in the intersect of A & B, labels present in the intersect of A & B & C, etc. This approach results usually in many duplications of labels among the intersect sets.


Additional Venn diagram resources are provided by the packages limma, gplots, vennerable, eVenn and VennDiagram, or online resources such as shapes, Venn Diagram Generator and Venny.


overLapper returns standard intersect and Venn intersect results as INTERSECTset or VENNset objects, respectively. These S4 objects contain the following components:


Original label sets accessible with setlist().


Present-absent matrix accessible with intersectmatrix(), where each overlap set in the vennlist data component is labeled according to the label set names provided under setlist. For instance, the composite name 'ABC' indicates that the entries are restricted to A, B and C. The seperator used for naming the intersect sets can be specified under the sep argument.


Complexity levels accessible with complexitylevels().


Venn intersects for VENNset objects accessible with vennlist().


Standard intersects for INTERSECTset objects accessible with intersectlist().


The functions provided here are an extension of the Venn diagram resources on this site: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Venn-Diagrams


Thomas Girke


See examples in 'The Electronic Journal of Combinatorics': http://www.combinatorics.org/files/Surveys/ds5/VennSymmExamples.html

See Also

vennPlot, olBarplot


## Sample data
setlist <- list(A=sample(letters, 18), B=sample(letters, 16),
                C=sample(letters, 20), D=sample(letters, 22),
                E=sample(letters, 18), F=sample(letters, 22))

## 2-way Venn diagram
vennset <- overLapper(setlist[1:2], type="vennsets")

## 3-way Venn diagram
vennset <- overLapper(setlist[1:3], type="vennsets")

## 4-way Venn diagram
vennset <- overLapper(setlist[1:4], type="vennsets")
vennPlot(list(vennset, vennset))

## Pseudo 4-way Venn diagram with circles
vennPlot(vennset, type="circle")

## 5-way Venn diagram
vennset <- overLapper(setlist[1:5], type="vennsets")

## Alternative Venn count input to vennPlot (not recommended!)
counts <- sapply(vennlist(vennset), length)

## 6-way Venn comparison as bar plot
vennset <- overLapper(setlist[1:6], type="vennsets")
olBarplot(vennset, mincount=1)

## Bar plot of standard intersect counts
interset <- overLapper(setlist, type="intersects")
olBarplot(interset, mincount=1)

## Accessor methods for VENNset/INTERSECTset objects

## Coerce VENNset/INTERSECTset object to list

## Pairwise intersect matrix and heatmap
olMA <- sapply(names(setlist), 
		function(x) sapply(names(setlist), 
		function(y) sum(setlist[[x]] %in% setlist[[y]])))
heatmap(olMA, Rowv=NA, Colv=NA)

## Presence-absence matrices for large numbers of sample sets
interset <- overLapper(setlist=setlist, type="intersects", complexity=2)
(paMA <- intersectmatrix(interset))
heatmap(paMA, Rowv=NA, Colv=NA, col=c("white", "gray")) 

tgirke/systemPipeR documentation built on Aug. 30, 2022, 10 p.m.