erma: epigenomics road map adventures

BiocStyle::markdown()
suppressPackageStartupMessages({
library(rtracklayer)
library(BiocStyle)
library(erma)
library(GenomicFiles)
library(ggplot2)
library(grid)
library(png)
library(DT)
})

Introduction

The epigenomics road map describes locations of epigenetic marks in DNA from a variety of cell types. Of interest are locations of histone modifications, sites of DNA methylation, and regions of accessible chromatin.

This package presents a selection of elements of the road map including metadata and outputs of the ChromImpute procedure applied to ENCODE cell lines by Ernst and Kellis.

Metadata on the ChromImpute archive

Sample metadata

I have retrieved a Google Docs spreadsheet with comprehensive information. The mapmeta() function provides access to a local DataFrame image of the file as retrieved in mid April 2015. We provide a dynamic view of a selection of columns. Use the search box to filter records shown, for example \code{LUNG}.

library(DT)
library(erma)
meta = mapmeta()
kpc = c("Comments", "Epigenome.ID..EID.", "Epigenome.Mnemonic", "Quality.Rating", 
"Standardized.Epigenome.name", "ANATOMY", "TYPE")
datatable(as.data.frame(meta[,kpc]))

Metadata on the inferred states

The chromatin states and standard colorings used are enumerated in states_25:

data(states_25)
datatable(states_25)

The emission parameters of the 25 state model are depicted in the supplementary Figure 33 of Ernst and Kellis:

library(png)
im = readPNG(system.file("pngs/emparms.png", package="erma"))
grid.raster(im)

Managing access to imputed chromatin states for a set of cell types

I have retrieved a modest number of roadmap bed files with ChromImpute mnemonic labeling of chromatin by states. These can be managed with an ErmaSet instance, a trivial extension of r Biocpkg("GenomicFiles") class. The cellTypes method yields a character vector. The colData component has full metadata on the cell lines available.

ermaset = makeErmaSet()
ermaset
cellTypes(ermaset)[1:5]
datatable(as.data.frame(colData(ermaset)[,kpc]))

Enumerating states in the vicinity of a gene, across cell types

We form a GRanges representing 50kb upstream of IL33.

uil33 = flank(resize(range(genemodel("IL33")), 1), width=50000)
uil33

Bind this to the ErmaSet instance.

rowRanges(ermaset) = uil33  
ermaset

Now query the files for cell-specific states in this interval.

library(BiocParallel)
register(MulticoreParam(workers=2))
suppressWarnings({
csstates = lapply(reduceByFile(ermaset, MAP=function(range, file) {
  imp = import(file, which=range, genome=genome(range)[1])
  seqlevels(imp) = seqlevels(range)
  imp$rgb = erma:::rgbByState(imp$name)
  imp
}), "[[", 1) 
})
tys = cellTypes(ermaset)  # need to label with cell types
csstates = lapply(1:length(csstates), function(x) {
   csstates[[x]]$celltype = tys[x]
   csstates[[x]]
   })
csstates[1:2]

This sort of code underlies the csProfile utility to visualize variation in state assignments in promoter regions for various genes.

csProfile(ermaset[,1:5], symbol="CD28", useShiny=FALSE)

Set useShiny to TRUE to permit interactive selection of region to visualize.



Try the erma package in your browser

Any scripts or data that you put into this service are public.

erma documentation built on Nov. 8, 2020, 5:39 p.m.