knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

tidyGenomeBrowser

tidyGenomeBrowser makes it easy to generate genome browser figures via R/Bioconductor. It acts as a thin layer of glue between the elaborate and efficient S4-system of Bioconductor and the widely used S3-system of the tidyverse.

Genome-browser images are simple, in the sense that they always share the same X-axis, specified by seqnames/start/end/strand information. tidyGenomeBrowser will take Bioconductor S4-objects such as GRanges, GPos, GRangesList and GInteractions and automatically process them into tidy data.frame object. These data.frame objects can then either be returned for further manipulations, or plotted directly via ggplot2.

tidyGenomeBrowser tries to use sensible defaults, making it very quick and well suited for interactive use. However, as all plotting data is returned as data.frame or ggplot objects, it's easy to manipulate them for publication grade figures.

tidyGenomeBrowser functions take almost no arguments, but reacts to some common meta-columns used in Bioconductor (e.g. score, pos, thick) or package-wide settings.

Installation

tidyGenomeBrowser is currently only available on GitHub:

devtools::install_github("MalteThodberg/tidyGenomeBrowser")

Overview

tidyGenomeBrowser contains 4 major functions, each capable oof plotting a specific type of data:

Quick start

Required packages

Packages needed for this example:

library(tidyGenomeBrowser)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(magrittr)
library(tidyverse)

Plotting a single track

The simplest case is just plotting data from a single object, called a single track:

CAGE_TCs

browseIntervals(CAGE_TCs)

The function will automatically set a proper plotting window in this case:

browseIntervals(CAGE_TCs[1:3])

Browsing multiple types of genomic data

Most often the goal is to plot multiple tracks together. This is done by setting up a plot region of interest, and then building a track for each data type.

Set up plot region around a gene of interest:

# Reference gene and transcript models
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

# Extract location of genes
gene_models <- genes(txdb)
RCOR1 <- gene_models["23186"]

# Plotting window: 500 bp around AKAP2
gene_window <- RCOR1 + 500

# Window around promoter for second examples
promoter_window <- promoters(RCOR1, upstream=500, downstream=500)

Extract transcript models:

tx_models <- exonsBy(txdb, by="tx", use.names=TRUE)
meta_models <- exonsBy(txdb, by="gene")

Build individual tracks:

# GWAS
gwas_track <- GWAS %>%
    transform(score=-log10(P)) %>%
    browsePositions(region=gene_window)

# Meta-genes
meta_track <- meta_models %>%
    browseTranscripts(region=gene_window)

# Open chromatin
signal_track <- ATAC_signal %>% 
    browseSignal(region=gene_window)
peak_track <- ATAC_peaks %>% 
    browseIntervals(region=gene_window)

# Transcription Start Sites
tc_track <- CAGE_TCs %>% 
    browseIntervals(region=gene_window)

# Interactions
contacts_track <- PCHiC %>%
    browseInteractions(region=gene_window)

# Baits
baits_track <- PCHiC %>%
    anchors %>%
    GRangesList %>% 
    unique %>%
    browseIntervals(region=gene_window)

Plot everything together:

list(contacts_track,
     baits_track,
     gwas_track,
     meta_track,
     signal_track,
     peak_track,
     tc_track) %>%
    browseStack()

Customizing plots

As all returned object are ggplots, it's easy to customize:

theme_set(theme_classic())

# Color GWAS track by SNP
gwas_track2 <- GWAS %>%
    transform(score=-log10(P),
              color=A1) %>%
    browsePositions(region=gene_window) +
    scale_color_brewer(palette="Set1")

# New customized plot
list(contacts_track + ylab("Contacts"),
     baits_track + ylab("Baits"),
     gwas_track2 + ylab("GWAS (-log10 P value)"),
     meta_track + ylab("Meta Gene"),
     signal_track + ylab("ATAC Signal"),
     peak_track + ylab("ATAC Peaks"),
     tc_track + ylab("FANTOM5")) %>%
    browseStack(heights=c(3, 1, 2, 1, 2, 1, 1))

Package-wide settings can also be used (see ?tidyGenomeBrowser for a complete list):

# Change strand colors
options(tidyGenomeBrowser.strand=c(`+`="forestgreen",
                                   `-`="purple",
                                   `*`="grey"))

# Change x label:
options(tidyGenomeBrowser.prefix="Position on ")

# Transcripts
options(tidyGenomeBrowser.name=FALSE)
tx_track <- meta_models %>%
    browseTranscripts(region=promoter_window)

# Open chromatin
signal_track <- ATAC_signal %>% 
    browseSignal(region=promoter_window)

# More wiggle room when stacking
options(tidyGenomeBrowser.wiggle=100)
tc_track <- CAGE_TCs %>% 
    browseIntervals(region=promoter_window)
ctss_track <- CAGE_CTSSs %>% 
    browseSignal(region=promoter_window)

# All tracks together and margins removed
list(tx_track + ylab("Tx models"),
     signal_track + ylab("ATAC-Seq"),
     ctss_track + ylab("CTSSs"),
     tc_track + ylab("TCs")) %>%
    browseStack(heights=c(1, 2, 3, 1))

browseStack can also trim margins around tracks if needed:

# Save list of tracks
plot_list <- list(tx_track + ylab("Tx models"),
     signal_track + ylab("ATAC-Seq"),
     ctss_track + ylab("CTSSs"),
     tc_track + ylab("TCs"))

# Plot using differente squeezings
browseStack(plot_list, squeeze="none") + plot_annotation(title="Margins: No squeezing")
browseStack(plot_list, squeeze="internal") + plot_annotation(title="Margins: Squeeze between plots")
browseStack(plot_list, squeeze="all") + plot_annotation(title="Margins: Squeeze all margins")

Intercepting plotting data

All plotting functions can be intercepted, e.g. the raw data.frame can be returned rather than plotted:

# Intercept
ctss_data <- CAGE_CTSSs %>% 
    browseSignal(plot=FALSE)

# Look at the data
head(ctss_data)

# Plotting window stored as attribute?
#attributes(ctss_data)

After manipulation, data can be replotted:

ctss_data %>%
    mutate(strand == "*",
           color=sample(x=letters[1:2], size=n(), replace=TRUE)) %>%
    browseSignal(region=promoter_window)

Note that no checks on the data is performed at this stage: This makes tidyGenomeBrowser very hackable, but you can also potentially screw up!

Cookbook

Below are some common recipes and tips & tricks for manipulating ggplots.

Adding highlights

TBA

Adding common legend

TBA

Tweaking name

TBA...



MalteThodberg/tidyGenomeBrowser documentation built on Feb. 21, 2024, 8:39 p.m.