Installation

Install the most recent stable version from Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("TeMPO")

And load TeMPO:

library(TeMPO)

Alternatively, you can install the development version directly from GitHub using devtools:

devtools::install_github("MalteThodberg/TeMPO")

Getting help and contact

For general questions about the usage of TeMPO, use the official Bioconductor support forum and tag your question "TeMPO". We strive to answer questions as quickly as possible.

For technical questions, bug reports and suggestions for new features, we refer to the TeMPO github page

Usage examples

Inputs and outputs

TeMPO can calculate the summarized signal (e.g. the average counts-per-millon) across a set of genomic regions (e.g. Transcription Start Sites (TSSs) and enhancers). TeMPO needs the following input:

TeMPO will output the result in "tidy" format as a tibble, making it easy to further analyze or plot the data using r BiocStyle::CRANpkg("ggplot2") and the r BiocStyle::CRANpkg("tidyverse") packages.

Example data sets

TeMPO example below use example data for hela cells (hg19) chromosomes 20-22 from the following sources:

First, we load the needed packages:

library(AnnotationHub)
library(tidyverse)

# Use light ggplot2 theme
theme_set(theme_light())

CAGE_clusters contain the location of TSSs and enhancer found using the CAGEfightR package. TSSs and enhancers are expected to show different epigenetic modifications, reflecting their different function:

data("CAGE_clusters")
CAGE_clusters

CAGE_plus and CAGE_minus contains the pooled CAGE signal across wildtype cells (WT) and exosome knock-out cell (KD). The exosome knock-down is expected to stabilize eRNA transcripts produced from enhancers that are normally quickly degraded. As the CAGE data is very sparse, it can be loaded directly into memory as an RleListList:

data("CAGE_plus")
data("CAGE_minus")
CAGE_plus

We can obtain epigenetic data for comparison to the CAGE data using the The Roadmap Epigenomics Project via the r BiocStyle::Biocpkg("AnnotationHub") package. As ChIP-Seq and DNase-signal is not sparse, we will store these signals as BigWigFile-objects rather than load them into memory:

NOTE: The first time you run this, it will take several minutes for AnnotationHub to download the file. After that, AnnotationHub will simply fetch the file from cache, as is done here.

# Setup AnnotationHub
ah <- AnnotationHub()

# Retrive data and save as BigWigFileList
ChIP_Seq <- list(DNase="AH32877",
     H3K4me1="AH32879",
     H3K4me3="AH32881",
     H3K27ac="AH32884") %>%
    lapply(function(x) ah[[x]]) %>%
    as("List")

Below are examples of generating metaprofiles for different combinations of input.

Single set of sites & single genome-wide signal

A simple example of (unstranded) DNAse-signal around promoters:

promoters_only <- subset(CAGE_clusters, txType == "promoter")

SS1 <- tidyMetaProfile(sites = promoters_only, 
                      forward=ChIP_Seq$DNase, reverse=NULL,
                      upstream=300, downstream=300)

head(SS1)

ggplot(SS1, aes(x=pos0, y=sense)) + 
    geom_line(alpha=0.75) +
    geom_vline(xintercept=0, alpha=0.75, linetype="dotted") +
    labs(x="Basepair position relative to center",
         y="Average DNase signal")

Classic example of bidirectional transcription of eRNAs at enhancers shown using a stranded meta-profile:

enhancers_only <- subset(CAGE_clusters, clusterType == "enhancer")

SS2 <- tidyMetaProfile(sites = enhancers_only, 
                      forward=CAGE_plus$WT, reverse=CAGE_minus$WT,
                      upstream=250, downstream=250)

head(SS2)

SS2 %>%
    gather(key="direction", value="score", sense, anti, factor_key=TRUE) %>%
    ggplot(aes(x=pos0, y=score, color=direction)) + 
    scale_color_brewer("Direction", palette="Set1") +
    geom_line(alpha=0.75) +
    geom_vline(xintercept=0, alpha=0.75, linetype="dotted") +
    labs(x="Basepair position relative to center",
         y="Average CAGE signal")

Single set of sites & multiple genome-wide signal

Multiple different epigentic signals across a single set of sites:

SM <- tidyMetaProfile(sites = promoters_only, 
                      forward=ChIP_Seq, reverse=NULL,
                      upstream=300, downstream=300)

head(SM)

ggplot(SM, aes(x=pos0, y=sense, color=signal)) + 
    geom_line(alpha=0.75) +
    geom_vline(xintercept=0, alpha=0.75, linetype="dotted") +
    labs(x="Basepair position relative to center",
         y="Average CAGE signal")

Multiple sets of sites & single genome-wide signal

H3K27ac at CAGE-defined TSSs at different positions in genes:

by_txType <- CAGE_clusters %>%
    subset(clusterType == "TSS" & txType %in% c("promoter", 
                                                "fiveUTR", 
                                                "proximal")) %>%
    splitAsList(.$txType, drop=TRUE)

MS <- tidyMetaProfile(sites = by_txType, 
                      forward=ChIP_Seq$DNase, reverse=NULL,
                      upstream=300, downstream=300)

head(MS)

ggplot(MS, aes(x=pos0, y=sense, color=sites)) + 
    geom_line(alpha=0.75) + 
    geom_vline(xintercept = 0, linetype="dotted", alpha=0.75) +
    scale_color_brewer("Genic Context", palette="Set2") +
    labs(x="Relative position from center", 
         y="Average H3K27ac Signal")

Multiple sets of sites & multiple genome-wide signal

Meta-profile showing the difference in epigentic modifications at TSSs vs enhancers:

by_clusterType <- split(CAGE_clusters, CAGE_clusters$clusterType)

MM1 <- tidyMetaProfile(sites = by_clusterType, 
                      forward=ChIP_Seq, reverse=NULL,
                      upstream=300, downstream=300)

head(MM1)

ggplot(MM1, aes(x=pos0, y=sense, color=sites)) + 
    geom_line(alpha=0.75) + 
    facet_grid(signal~., scales="free_y") +
    labs(x="Relative position from center", 
         y="Average Signal")

Meta-profile showing the effect of the exosome knockdown on eRNAs detected by CAGE:

MM2 <- tidyMetaProfile(sites = by_clusterType, 
                      forward=CAGE_plus, reverse=CAGE_minus,
                      upstream=300, downstream=300)

head(MM2)

MM2 %>%
    gather(key="direction", value="score", sense, anti, factor_key=TRUE) %>%
    ggplot(aes(x=pos0, y=score, color=direction)) + 
    geom_line(alpha=0.75) + 
    facet_grid(sites~signal, scales="free_y") +
    scale_color_brewer("Direction", palette="Set1") +
    labs(x="Relative position from center", 
         y="Average Signal")

Advanced usage

Parallel Execution

In case multiple genomic signals are provided (Multiple BigWigFile-objects in a BigWigFileList or multiple RleList-objects in an RleListList), signals can be analyzed in parallel using the r BiocStyle::Biocpkg("BiocParallel") package. TidyMetaProfile uses the default registered backend:

library(BiocParallel)

# Set the backend to run calculations in series
register(SerialParam())

# Set the backend to run parallelize calculations using i.e. 3 cores:
register(MulticoreParam(workers=3))

Alternative meta-summary functions

Instead of calculating the mean across sites, alternative summary functions can be provide. For example, instead of the average meta-profile, we can plot the median meta-profile:

# Recalculate the first example using medians
SS1_median <- tidyMetaProfile(sites = promoters_only, 
                      forward=ChIP_Seq$DNase, reverse=NULL,
                      upstream=300, downstream=300,
                      sumFun = matrixStats::colMedians)

# Merge the two profiles and plot
list(mean=SS1, median=SS1_median) %>% 
    bind_rows(.id="summary") %>%
    ggplot(aes(x=pos0, y=sense, color=summary)) +
    geom_line(alpha=0.75) +
    geom_vline(xintercept=0, alpha=0.75, linetype="dotted") +
    scale_color_discrete("Summary-function") +
    labs(x="Basepair position relative to center",
         y="Average DNase signal")

Outlier removal/trimming

In many cases, a few sites may have very extreme values which can disproportionally skew the calculated average profiles. tidyMetaProfile can automatically trim the sites with lowest and/or highest signals based on quantiles:

# Recalculate the first example with different quantile trimmings:
SS1_95percent <- tidyMetaProfile(sites = promoters_only, 
                      forward=ChIP_Seq$DNase, reverse=NULL,
                      upstream=300, downstream=300,
                      trimUpper=0.95)

SS1_90percent <- tidyMetaProfile(sites = promoters_only, 
                      forward=ChIP_Seq$DNase, reverse=NULL,
                      upstream=300, downstream=300,
                      trimUpper=0.90)

# Merge the three profiles and plot
list(`100%`=SS1, 
     `95%`=SS1_95percent, 
     `90%`=SS1_90percent) %>% 
    bind_rows(.id="summary") %>%
    ggplot(aes(x=pos0, y=sense, color=summary)) +
    geom_line(alpha=0.75) +
    geom_vline(xintercept=0, alpha=0.75, linetype="dotted") +
    scale_color_discrete("Trimming-level") +
    labs(x="Basepair position relative to center",
         y="Average DNase signal")

Access to low-level functions

The low-level functions used by tidyMetaProfile are all exposed for advanced user:

The man pages for those functions contains details on how to use them.

Session Info

sessionInfo()


MalteThodberg/TeMPO documentation built on May 15, 2019, 11:48 a.m.