Enrichment of the H3K27ac and H4K27me3 histone marks in active transcription start sites (TSS)

Introduction

There are 2 kinds of experimental designs that can be used for a metagene analysis. This document will introduce the second kind, which is the comparison of the same group of regions between 2 ChIP-Seq experiments.

metagene was used to robustly test for the difference in the enrichment of the H3K27ac and H4K27me3 histones marks, for the same active transcription start sites (TSS), in in A549 cells. More information about those data can be found here :
http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#exp_18state

1. Load metagene package

In R, we must first load the packrat package, which enable data usage, and metagene package:

suppressMessages(library(packrat))
packrat::on("../")
suppressMessages(library(metagene))

2. Load TSS of interests

We will use the TSS as defined by the Roadmap Consortium (Romanoski et al., 2015). For more details on how the data were downloaded, please see the R/get_regions.R file.

Only one groupe of active TSS will be used.

#data(TssA)   # Active TSS
load("../data/TssA.RData")   # Active TSS

# We will make sure that all the regions have the same size to avoid having to
# scale them during the metagene analysis
TssA <- GenomicRanges::resize(TssA, 1000, fix = "center")
regions <- GenomicRanges::GRangesList(TssA)
names(regions) <- c("TssA")

# We need to remove values from chrY that are not present in some bam files
regions <- lapply(regions, function(x) x[seqnames(x) != "chrY"])

# For memory and speed consideration, we will only use a subset of the regions
regions <- lapply(regions, function(x) x[sample(seq_along(x), 1000)])
regions <- GenomicRanges::GRangesList(regions)
regions

3. Loading ChIP-Seq datasets

For the ChIP-Seq of the H3K27ac histone marks, we will use the ENCFF000AKF and ENCFF000AKI files from ENCODE (ENCODE Project Consortium, 2012). The ENCFF000AKF.bam and ENCFF000AKI.bam files are two replicates from the same experiment.

For the ChIP-Seq of the H3K27me3 histone marks, we will use the ENCFF000AHC and ENCFF000AHD files from ENCODE (ENCODE Project Consortium, 2012). The ENCFF000AHC.bam and ENCFF000AHD.bam files are also two replicates from the same experiment.

bam_files <- c("../inst/extdata/ENCFF000AKF.bam",
               "../inst/extdata/ENCFF000AKI.bam",
               "../inst/extdata/ENCFF000VGB.bam",
               "../inst/extdata/ENCFF000VFS.bam")
bam_files

The bam_files object should have 4 elements.

4. Experimental design

Multiple samples should be included in the same profile. Replicates need to be combined. On group is created for each histone. The design has to be constructed so that metagene will process correctly each file.

design <- data.frame(samples = bam_files,
                     H3K27ac = c(1,1,0,0),
                     H3K27me3 = c(0,0,1,1))
design

The design uses the value 1 for a dataset and 2 for a control.

5. metagene analysis

The first step is to create a metagene object using metagene$new() function. The goal of this first metagene step is to extract the coverage from all BAM files present in the bam_files object and to normalize the signal. Since this step can be computationally intensive, we do not want to do it every time we want to experiment with a new design. Thus, it is a good idea to extract the coverage from every BAM files in a single step, save the results in RData format and then explore different designs. The design object will be used in next steps..

#mg <- metagene$new(regions = regions, bam_files = bam_files, cores = 2)
mg <- metagene$new(regions = regions, bam_files = bam_files)
# We could save this object to avoid re-doing this computationally intensive
# step:
#save(mg, file = "mg.RData")

6. metagene graph H3K27ac vs H3K27me3

The plot() function allows us to create a metagene graph. By default, the graph will contain a profile for each combination of group of regions and BAM file. In this vignette, we have 4 BAM files (2 replicates for H3K27ac and 2 replicates for H3K27me3) and one group of regions (TssA). It means that if we do not use a design, metagene will produce 4 profiles.

In this case, we only want to produce a profile for each histone mark (i.e. we want to combine the replicates). The groups of BAM files have already been defined in the design object (see step 4).

df <- mg$plot(design = design, bin_size = 10)

Figure S5. H3K27ac versus H3K27me3

The list returned by the plot() function can be saved and reused.

# We could save the list returned by the plot() function and
# use it to create a new graph
# save(df, file = "df.RData")

metagene has statistically detected differences between those two histone marks profiles using Friedman rank sum test. Evenmore, metagene has robustly computed confidence intervals (CI) of the estimators (mean) for each bin using a bootstrap method.

References

ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489: 57–74. doi: 10.1038/nature11247. pmid:22955616

Roadmap Epigenomics Consortium et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature, 518: 317–330.

Romanoski, C. E. et al. (2015). Epigenomics: Roadmap for regulation. Nature, 518: 314–316. doi: 10.1038/518314a. pmid:25693562



CharlesJB/metageneVignettes documentation built on May 6, 2019, 9:58 a.m.