There are 2 kinds of experimental designs that can be used for a metagene
analysis. This document will introduce the second kind, which is the comparison
of the same group of regions between 2 ChIP-Seq experiments.
metagene was used to robustly test for the difference in the enrichment of
the H3K27ac and H4K27me3 histones marks, for the same active transcription start
sites (TSS), in in A549 cells. More information about those data can be
found here :
http://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#exp_18state
In R, we must first load the packrat package, which enable data usage, and metagene package:
suppressMessages(library(packrat)) packrat::on("../") suppressMessages(library(metagene))
We will use the TSS as defined by the Roadmap Consortium (Romanoski et al., 2015). For more details on how the data were downloaded, please see the R/get_regions.R file.
Only one groupe of active TSS will be used.
#data(TssA) # Active TSS load("../data/TssA.RData") # Active TSS # We will make sure that all the regions have the same size to avoid having to # scale them during the metagene analysis TssA <- GenomicRanges::resize(TssA, 1000, fix = "center") regions <- GenomicRanges::GRangesList(TssA) names(regions) <- c("TssA") # We need to remove values from chrY that are not present in some bam files regions <- lapply(regions, function(x) x[seqnames(x) != "chrY"]) # For memory and speed consideration, we will only use a subset of the regions regions <- lapply(regions, function(x) x[sample(seq_along(x), 1000)]) regions <- GenomicRanges::GRangesList(regions) regions
For the ChIP-Seq of the H3K27ac histone marks, we will use the ENCFF000AKF
and
ENCFF000AKI
files from ENCODE (ENCODE Project Consortium, 2012). The
ENCFF000AKF.bam
and ENCFF000AKI.bam
files
are two replicates from the same experiment.
For the ChIP-Seq of the H3K27me3 histone marks, we will use the ENCFF000AHC
and ENCFF000AHD
files from ENCODE (ENCODE Project Consortium, 2012). The
ENCFF000AHC.bam
and ENCFF000AHD.bam
files
are also two replicates from the same experiment.
bam_files <- c("../inst/extdata/ENCFF000AKF.bam", "../inst/extdata/ENCFF000AKI.bam", "../inst/extdata/ENCFF000VGB.bam", "../inst/extdata/ENCFF000VFS.bam") bam_files
The bam_files object should have 4 elements.
Multiple samples should be included in the same profile. Replicates need to be
combined. On group is created for each histone. The design has to be
constructed so that metagene
will process correctly each file.
design <- data.frame(samples = bam_files, H3K27ac = c(1,1,0,0), H3K27me3 = c(0,0,1,1)) design
The design
uses the value 1
for a dataset and 2
for a control.
The first step is to create a metagene
object using metagene$new()
function. The goal of this first metagene step is to extract the coverage
from all BAM files present in the bam_files
object and to normalize the
signal. Since this step can be computationally intensive, we do not want to do
it every time we want to experiment with a new design. Thus, it is a good idea
to extract the coverage from every BAM files in a single step, save the
results in RData
format and then explore different designs. The design
object will be used in next steps..
#mg <- metagene$new(regions = regions, bam_files = bam_files, cores = 2) mg <- metagene$new(regions = regions, bam_files = bam_files) # We could save this object to avoid re-doing this computationally intensive # step: #save(mg, file = "mg.RData")
The plot()
function allows us to create a metagene graph. By default, the
graph will contain a profile for each combination of group of regions and BAM
file. In this vignette, we have 4 BAM files (2 replicates for H3K27ac and 2
replicates for H3K27me3) and one group of regions (TssA
). It means that if
we do not use a design, metagene will produce 4 profiles.
In this case, we only want to produce a profile for each histone mark (i.e. we
want to combine the replicates). The groups of BAM files have already been
defined in the design
object (see step 4).
df <- mg$plot(design = design, bin_size = 10)
Figure S5. H3K27ac versus H3K27me3
The list
returned by the plot()
function can be saved and reused.
# We could save the list returned by the plot() function and # use it to create a new graph # save(df, file = "df.RData")
metagene has statistically detected differences between those two histone marks profiles using Friedman rank sum test. Evenmore, metagene has robustly computed confidence intervals (CI) of the estimators (mean) for each bin using a bootstrap method.
ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489: 57–74. doi: 10.1038/nature11247. pmid:22955616
Roadmap Epigenomics Consortium et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature, 518: 317–330.
Romanoski, C. E. et al. (2015). Epigenomics: Roadmap for regulation. Nature, 518: 314–316. doi: 10.1038/518314a. pmid:25693562
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.