knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
tidyGenomeBrowser
makes it easy to generate genome browser figures via R/Bioconductor. It acts as a thin layer of glue between the elaborate and efficient S4-system of Bioconductor and the widely used S3-system of the tidyverse.
Genome-browser images are simple, in the sense that they always share the same X-axis, specified by seqnames/start/end/strand information. tidyGenomeBrowser
will take Bioconductor S4-objects such as GRanges
, GPos
, GRangesList
and GInteractions
and automatically process them into tidy data.frame
object. These data.frame
objects can then either be returned for further manipulations, or plotted directly via ggplot2.
tidyGenomeBrowser
tries to use sensible defaults, making it very quick and well suited for interactive use. However, as all plotting data is returned as data.frame
or ggplot
objects, it's easy to manipulate them for publication grade figures.
tidyGenomeBrowser
functions take almost no arguments, but reacts to some common meta-columns used in Bioconductor (e.g. score
, pos
, thick
) or package-wide settings.
tidyGenomeBrowser
is currently only available on GitHub:
devtools::install_github("MalteThodberg/tidyGenomeBrowser")
tidyGenomeBrowser
contains 4 major functions, each capable oof plotting a specific type of data:
GRanges
.GRanges
.GInteractions
.GPos
Packages needed for this example:
library(tidyGenomeBrowser) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(magrittr) library(tidyverse)
The simplest case is just plotting data from a single object, called a single track:
CAGE_TCs
browseIntervals(CAGE_TCs)
The function will automatically set a proper plotting window in this case:
browseIntervals(CAGE_TCs[1:3])
Most often the goal is to plot multiple tracks together. This is done by setting up a plot region of interest, and then building a track for each data type.
Set up plot region around a gene of interest:
# Reference gene and transcript models txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # Extract location of genes gene_models <- genes(txdb) RCOR1 <- gene_models["23186"] # Plotting window: 500 bp around AKAP2 gene_window <- RCOR1 + 500 # Window around promoter for second examples promoter_window <- promoters(RCOR1, upstream=500, downstream=500)
Extract transcript models:
tx_models <- exonsBy(txdb, by="tx", use.names=TRUE) meta_models <- exonsBy(txdb, by="gene")
Build individual tracks:
# GWAS gwas_track <- GWAS %>% transform(score=-log10(P)) %>% browsePositions(region=gene_window) # Meta-genes meta_track <- meta_models %>% browseTranscripts(region=gene_window) # Open chromatin signal_track <- ATAC_signal %>% browseSignal(region=gene_window) peak_track <- ATAC_peaks %>% browseIntervals(region=gene_window) # Transcription Start Sites tc_track <- CAGE_TCs %>% browseIntervals(region=gene_window) # Interactions contacts_track <- PCHiC %>% browseInteractions(region=gene_window) # Baits baits_track <- PCHiC %>% anchors %>% GRangesList %>% unique %>% browseIntervals(region=gene_window)
Plot everything together:
list(contacts_track, baits_track, gwas_track, meta_track, signal_track, peak_track, tc_track) %>% browseStack()
As all returned object are ggplots, it's easy to customize:
theme_set(theme_classic()) # Color GWAS track by SNP gwas_track2 <- GWAS %>% transform(score=-log10(P), color=A1) %>% browsePositions(region=gene_window) + scale_color_brewer(palette="Set1") # New customized plot list(contacts_track + ylab("Contacts"), baits_track + ylab("Baits"), gwas_track2 + ylab("GWAS (-log10 P value)"), meta_track + ylab("Meta Gene"), signal_track + ylab("ATAC Signal"), peak_track + ylab("ATAC Peaks"), tc_track + ylab("FANTOM5")) %>% browseStack(heights=c(3, 1, 2, 1, 2, 1, 1))
Package-wide settings can also be used (see ?tidyGenomeBrowser
for a complete list):
# Change strand colors options(tidyGenomeBrowser.strand=c(`+`="forestgreen", `-`="purple", `*`="grey")) # Change x label: options(tidyGenomeBrowser.prefix="Position on ") # Transcripts options(tidyGenomeBrowser.name=FALSE) tx_track <- meta_models %>% browseTranscripts(region=promoter_window) # Open chromatin signal_track <- ATAC_signal %>% browseSignal(region=promoter_window) # More wiggle room when stacking options(tidyGenomeBrowser.wiggle=100) tc_track <- CAGE_TCs %>% browseIntervals(region=promoter_window) ctss_track <- CAGE_CTSSs %>% browseSignal(region=promoter_window) # All tracks together and margins removed list(tx_track + ylab("Tx models"), signal_track + ylab("ATAC-Seq"), ctss_track + ylab("CTSSs"), tc_track + ylab("TCs")) %>% browseStack(heights=c(1, 2, 3, 1))
browseStack
can also trim margins around tracks if needed:
# Save list of tracks plot_list <- list(tx_track + ylab("Tx models"), signal_track + ylab("ATAC-Seq"), ctss_track + ylab("CTSSs"), tc_track + ylab("TCs")) # Plot using differente squeezings browseStack(plot_list, squeeze="none") + plot_annotation(title="Margins: No squeezing") browseStack(plot_list, squeeze="internal") + plot_annotation(title="Margins: Squeeze between plots") browseStack(plot_list, squeeze="all") + plot_annotation(title="Margins: Squeeze all margins")
All plotting functions can be intercepted, e.g. the raw data.frame
can be returned rather than plotted:
# Intercept ctss_data <- CAGE_CTSSs %>% browseSignal(plot=FALSE) # Look at the data head(ctss_data) # Plotting window stored as attribute? #attributes(ctss_data)
After manipulation, data can be replotted:
ctss_data %>% mutate(strand == "*", color=sample(x=letters[1:2], size=n(), replace=TRUE)) %>% browseSignal(region=promoter_window)
Note that no checks on the data is performed at this stage: This makes tidyGenomeBrowser very hackable, but you can also potentially screw up!
Below are some common recipes and tips & tricks for manipulating ggplots.
TBA
TBA
TBA...
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.