In mireia-bioinfo/plotRegulome: Plot Islet Regulome data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)

Introduction

The plotRegulome package has been developed with the idea of helping users produce and modify the Islet Regulome Browser plots.

Quick start

library(plotRegulome)

## Download IRB database
# dir <- "~/data/"
# path <- downloadIRB(output_dir=dir) # Download IRB data, returns path for database

## Define parameters for plotting
coordinates <- "chr5:50678921-51318155"
snps_dataset <- "diagram"
contacts_dataset <- "ISL1"
maps_dataset <- "openChromatinClasses"
clusters_dataset <- "enhancerClusters"
tfs_dataset <- "adult"

## Call ploRegulome function
plotRegulome(coordinates=coordinates,
             snps_dataset=snps_dataset,
             contacts_dataset=contacts_dataset,
             maps_dataset=maps_dataset,
             clusters_dataset=clusters_dataset,
             tfs_dataset=tfs_dataset,
             path="~/data/IRB_database/")

Datasets

In order to download all necessary datasets to plot islet data with this package, there is a wrapper function that does it for you. You just need to provide the path where the datasets for the IRB need to be stored.

dir <- "~/data/"

downloadIRB(output_dir=dir) # Download IRB data

You will see that the donwloaded data has the following structure:

IRB_database
|-- code
|-- hg19
|   |-- clusters
|   |-- genes
|   |-- maps
|   |-- snps
|   |-- tfs
|   `-- virtual4c
|       |-- 1
|       |-- 2
|       |-- 3
|       |-- 4
|       |-- 5
|       |-- 6
|       |-- 7
|       |-- 8
|       `-- 9
`-- shared

Each folder inside hg19 contains the datasets related to selectable data, summarized in the table below:

library(knitr)
library(kableExtra)

dat <- read.delim("table_datasets.csv")

kable(dat[,-1],
      caption="Selectable datasets for plotting.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                font_size = 10) %>%
  pack_rows("Clusters", 1, 5) %>%
  pack_rows("Maps", 6, 9) %>%
  pack_rows("SNPs", 10, 12) %>%
  pack_rows("TFs", 13, 15)

The name used for appropriatly calling datasets are listed in the column Dataset and in the documentation of plotRegulome.

Additionally, this package also includes gene annotation datasets (genes folder) that are shown by default:

library(knitr)
library(kableExtra)

dat <- read.delim("table_annotation.csv")

kable(dat[,-1],
      caption="Gene annotation datasets") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                font_size = 10)

Finally, virtual 4C data from @Miguel-Escalada2019 can be found in virtual4c folder. Each bait is named with a numerical identifier and they are divided in folders according to the first number of the ID. This is done to prevent having t0o many files in one directory. The correspondence of bait IDs and bait names can be found in virtual4c/baitID_keyTable.rda.

Plot description

The plot illustrates regulatory regions, transcription factors binding sites, chromatin interaction data and GWAS variants in which the sequence of the base genome is represented on the horizontal axis.

In the upper part of the plot a green line on the chromosome ideogram reflects the portion of the chromosome displayed. Each dot represents a genomic variant, being the color intensity of the dot proportional to -Log10 p-value of association, as indicated on the side of the plot. The rs# ID depicts the top associated variants in the locus.

The virtual 4C data is visualized as an interaction frequency histogram plot, where the bait/viewpoint is depicted as a black triangle region. A distal peak in the signal indicates that there is a chromatin interaction event with the bait/viewpoint. The interaction significance is depicted by a color code reflecting the CHiCAGO scores. Such chromatin interaction maps were extracted from promoter capture HiC data centered on the queried bait region, hence, virtual 4C.

The black box in the middle of the plot contains vertical colored bands depicting different chromatin states, open chromatin classes or regulatory elements as described in the legend above the plot. Black lines connecting the circles (each representing a different transcription factor) to the black box, point to the genomic location of each transcription factor binding site. The color intensity of such lines is proportional to the number of co-bound transcription factors.

Annotated genes are depicted as horizontal gray lines at the bottom of the plot. Boxes along the line correspond to positions of coding exons. The color of the box indicates the type of annotation: black boxes indicate lncRNAs, grey boxes indicate genes and purple boxes represent islet-specific genes.

Plotting the Islet Regulome plot

For a detailed description of all plot elements you can visit isletregulome.com and go to Info > Description of the plot or read the section Plot description below.

In order to plot an Islet Regulome image, you will need to provide the coordinates of the region you want to plot, together with the names of the datasets you want to show. Please check ?plotRegulome or the Datasets section to see all available datasets and their code names.

library(plotRegulome)

coordinates <- "chr13:28354157-28550368"
snps_dataset <- "70KforT2D"
contacts_dataset <- "PDX1"
maps_dataset <- "chromatinClassesReduced"
clusters_dataset <- "enhancerHubs"
tfs_dataset <- "structure"

plotRegulome(coordinates=coordinates,
             snps_dataset=snps_dataset,
             contacts_dataset=contacts_dataset,
             maps_dataset=maps_dataset,
             clusters_dataset=clusters_dataset,
             tfs_dataset=tfs_dataset,
             path="~/data/IRB_database/")

You can save the Islet Regulome plots using the ggsave function from the ggplot2 package.

ggsave(file="example_plotRegulome.png",
       width=11, height=6)

Customizing Islet Regulome plot colors

Additional arguments to change the plotting color for the different elements can be used:

snps_col. Character with the name or the hex value for the darkest color to use when plotting SNPs. Default: dark red.
contacts_col. Named list with the colors for each grouping of CHiCAGO scores. Default: c("0"="grey", "3"="blue", "5"="dark orange")
cluster_col. Character with the name or the hex value for color to use when plotting enhancer clusters. Default: dark green.
tfs_col. Character with the name or the hex value for the TFs border. Default: "dark blue".
genes_col. Named list with the colors for each type of gene annotation, where gene refers to gene annotation, lnc to lncRNAs and spec to islet-specific genes. Default: c(gene="dark grey", spec="darkorchid3", lnc="black")

plotRegulome(coordinates=coordinates,
             snps_dataset=snps_dataset,
             snps_col="mediumorchid4",
             contacts_dataset=contacts_dataset,
             contacts_col=c("0"="dark grey", "3"="skyblue3", "5"="olivedrab4"),
             maps_dataset=maps_dataset,
             clusters_dataset=clusters_dataset,
             cluster_col="dodgerblue4",
             tfs_dataset=tfs_dataset,
             tfs_col="violetred4",
             genes_col=c("gene"="thistle3", "lnc"="orangered3", "spec"="yellow3"),
             path="~/data/IRB_database/")

Random Islet Regulome plot generator

Finally, there is an additional argument to the plotRegulome function that generates random plots with random available datasets. For simplicity reasons, contacts_dataset are not used, as it is not likely that a bait in the selected region would be randomly selected.

Any specifications on datasets will be overwritten when randomIRB=TRUE. However, you can still specify colors for the different data as explained above.

set.seed(2384982)

plotRegulome(randomIRB=T,
             path="~/data/IRB_database/")

Plotting datasets separately

In case you are interested in plotting one or a few datasets from the Islet Regulome, this package also contains the necessary functions to do so.

The first step is to create an object with tha dataset and type of data you want to plot. For this reason, there are a set for creator functions that will compute all necessary data to plot a specific dataset:

create_snpsRegulome. Generates a snpsRegulome object containing SNPs data in the region of interest.
create_contactsRegulome. Produces a contactsRegulome object containing regions in contact with the bait of interest and their CHiCAGO scores.
create_mapsRegulome. Produces a mapsRegulome object with the chromatin maps in the region of interest.
create_clustersRegulome. Produces a clustersRegulome object with information of the enhancer clusters in the reigon of interest.
create_tfsRegulome. Produces a tfsRegulome object with information on the TFBSin the region of interest.
create_genesRegulome. Generates gene annotation in a genesRegulome object containing information of genes in the region, islet lncRNAs and their islet specificity.

All of these functions take a coordinates argument, which is a GRanges object with the coordinates of the region you want to plot, and an xxx_dataset (except create_genesRegulome), which contains the dataset name as defined in the section Datasets above. Each of this function has additional arguments which you can check in their corresponding check pages (i.e ?create_snpsRegulome).

For plotting each dataset, you just need to use the plot() function with the xxxRegulome object you just created. This will produce a list of ggplot layers that need to be added to a ggplot() function to generate the plot.

library(ggplot2)
library(plotRegulome)

coordinates <- "chr20:22461643-22666093"

snps <- create_snpsRegulome(coordinates=coordinates,
                            snps_dataset="magic",
                            path="~/data/IRB_database/")


ggplot() + plotR(snps)

contacts <- create_contactsRegulome(coordinates=coordinates,
                                    contacts_dataset="FOXA2",
                                    path="~/data/IRB_database/")

ggplot() + plotR(contacts)

Other types of dataset are of no much use when plot alone, such as tfsRegulome or clustersRegulome. You can also plot them together as follows:

enh <- create_clustersRegulome(coordinates=coordinates,
                               clusters_dataset="superEnhancers",
                               path="~/data/IRB_database/")

maps <- create_mapsRegulome(coordinates=coordinates,
                            maps_dataset="chromatinClassesReduced",
                            path="~/data/IRB_database/")

tfs <- create_tfsRegulome(coordinates=coordinates,
                          tfs_dataset="adult",
                          path="~/data/IRB_database/")


comb_plot <- ggplot() + plotR(enh) + plotR(maps) + plotR(tfs) + ylim(0.4,NA)
comb_plot

If you want to include the gene annotation, you should generate the plot separately and you can then combine it with the maps using plot_grid function from the cowplot package.

genes <- create_genesRegulome(coordinates=coordinates,
                              path="~/data/IRB_database/")

anno_plot <- ggplot() + plotR(genes)

library(cowplot)
cowplot::plot_grid(comb_plot + themeXblank(),
                   anno_plot,
                   ncol=1,
                   align="v",
                   rel_heights = c(0.6, 0.4))