knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This tutorial shows how to use RVenn
, a package for dealing with multiple
sets. The base R functions (intersect
, union
and setdiff
) only work with
two sets. %>%
can be used from magrittr
but, for many sets this can be
tedious. reduce
function from purrr
package also provides a solution, which
is the function that is used for set operations in this package. The functions
overlap
, unite
and discern
abstract away the details, so one can just
construct the universe and choose the sets to operate by index or set name.
Further, by using ggvenn
Venn diagram can be drawn for 2-3 sets. As you can
notice from the name of the function, ggvenn
is based on ggplot2
, so it is a
neat way to show the relationship among a reduced number sets. For many sets, it
is much better to use UpSet or setmap
function provided within this package. Finally, by using enrichment_test
function, the p-value of an overlap between two sets can be calculated. Here,
the usage of all these functions will be shown.
This chunk of code will create 10 sets with sizes ranging from 5 to 25.
library(purrr) library(RVenn) library(ggplot2)
set.seed(42) toy = map(sample(5:25, replace = TRUE, size = 10), function(x) sample(letters, size = x)) toy[1:3] # First 3 of the sets.
toy = Venn(toy)
Intersection of all sets:
overlap(toy)
Intersection of selected sets (chosen with set names or indices, respectively):
overlap(toy, c("Set_1", "Set_2", "Set_5", "Set_8"))
overlap(toy, c(1, 2, 5, 8))
overlap_pairs(toy, slice = 1:4)
Union of all sets:
unite(toy)
Union of selected sets (chosen with set names or indices, respectively):
unite(toy, c("Set_3", "Set_8"))
unite(toy, c(3, 8))
unite_pairs(toy, slice = 1:4)
discern(toy, 1, 8)
discern(toy, "Set_1", "Set_8")
discern(toy, c(3, 4), c(7, 8))
discern_pairs(toy, slice = 1:4)
For two sets:
ggvenn(toy, slice = c(1, 5))
For three sets:
ggvenn(toy, slice = c(3, 6, 8))
setmap(toy)
Without clustering
setmap(toy, element_clustering = FALSE, set_clustering = FALSE)
er = enrichment_test(toy, 6, 7) er$Significance qplot(er$Overlap_Counts, geom = "blank") + geom_histogram(fill = "lemonchiffon4", bins = 8, color = "black") + geom_vline(xintercept = length(overlap(toy, c(6, 7))), color = "firebrick2", size = 2, linetype = "dashed", alpha = 0.7) + ggtitle("Null Distribution") + theme(plot.title = element_text(hjust = 0.5)) + scale_x_continuous(name = "Overlap Counts") + scale_y_continuous(name = "Frequency")
The test above, of course, is not very meaningful as we randomly created the
sets; therefore, we get a high p-value. However, when you are working with
actual data, e.g. to check if a motif is enriched in the promoter regions of the
genes in a gene set, you can use this test. In that case, set1
will be the
gene set of interest, set2
will be the all the genes that the motif is found
in the genome and univ
will be all the genes of a genome.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.