knitr::opts_chunk$set(collapse = TRUE, comment = "#>") # Skip evaluation of all chunks on CRAN's auto-check farm to fit the # 10-minute build budget. Locally, on CI, and under devtools::check(), # NOT_CRAN=true and all chunks evaluate normally. The vignette source # (which CRAN users see in browseVignettes() / vignette()) is unchanged. NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set(eval = NOT_CRAN)
This vignette uses the bundled dataset_real_cancer_drivers_4 dataset to
illustrate a real biological analysis: how do four canonical cancer driver
catalogs overlap?
The four sources are:
library(vennDiagramLab) ds <- load_sample("dataset_real_cancer_drivers_4") ds@set_names
sapply(ds@items, length)
The lists are very different in size — Vogelstein is the smallest curated set; OncoKB is the most permissive at this annotation tier.
The dataset was built from a 20,000-gene background (universe_size):
ds@universe_size
This is the population N used in the hypergeometric over-representation
tests (see vignette("v05_statistics_deep_dive")).
result <- analyze(ds) result@model length(result@regions)
The default model for 4 sets is venn-4-set (Edwards-style).
result@set_sizes
broom::glance() returns a one-row tibble with the headline numbers:
broom::glance(result)
The default render uses the dataset's set names as labels. To shorten them for the diagram, pass a per-letter override:
svg <- render_venn_svg( result, set_names = c(A = "Vogelstein", B = "COSMIC", C = "OncoKB", D = "IntOGen"), title = "Cancer driver overlap (4 sources)" ) nchar(svg)
(See vignette("v08_custom_styling_and_export") for color overrides and
post-render SVG manipulation.)
For 4+ sets, an UpSet plot is often easier to read than the Venn diagram — each intersection size is a bar, sorted by cardinality.
upset_plot <- render_upset(result, sort_by = "size") upset_plot
(The chunk above is gated on R >= 4.6 because the CRAN release of
ComplexUpset (1.3.3) is incompatible with ggplot2 >= 4.0 on older R —
see ?vennDiagramLab::render_upset for context.)
broom::tidy() returns one row per set pair, with all five pairwise metrics
plus the BH-FDR-adjusted hypergeometric p-value:
top_pairs <- broom::tidy(result) top_pairs[order(top_pairs$p_adjusted), c("set_a", "set_b", "intersection", "jaccard", "p_adjusted", "significant")]
Every pair is significant at FDR < 0.05 (as expected — these catalogs are designed to overlap on biology).
broom::augment() returns one row per gene with set-membership flags and
the region label.
gene_table <- broom::augment(result) head(gene_table) nrow(gene_table) # total unique genes across all four sets table(gene_table$region_label) # how many genes in each region
to_region_summary_tsv(result, "cancer_drivers_regions.tsv")
vignette("v05_statistics_deep_dive") — interpret the Jaccard / Dice /
hypergeometric numbers in detail.vignette("v07_pdf_reports") — turn this analysis into a multi-page PDF.vignette("v08_custom_styling_and_export") — customize colors, embed in a
ggplot, export to PDF/PNG.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.