The coupling of chromosome conformation capture (3C)-based and next-generation sequencing (NGS) enable high-throughput detection of long-range genomic interactions via the generation of novel ligation products between DNA sequences that are closely juxtaposed in vivo. These interactions may involve promoter regions, enhancers and other regulatory and structural elements of chromosomes, and can reveal key details in the regulation of gene expression. 3C-seq is a variant of the method for the detection of interactions between one chosen genomic element (viewpoint) and the rest of the genome. We present a R/Bioconductor package designed to perform 3C-seq data analysis in a number of different experimental designs, with or without a control experiment. The package can also be used to perform data analysis for the experiment with replicates. The package provides functions to perform 3C-seq data normalization, statistical analysis for cis/trans interactions and visualization to facilitate the identification of genomic regions that physically interact with the given viewpoints of interest. The r3Cseq package greatly facilitates hypothesis generation and the interpretation of experimental results.
This vignette describes how to use the r3Cseq package. r3Cseq [@Supat:2013] is a Bioconductor-compliant R package designed to facilitate the identification of interaction regions generated by chromosome conformation capture and next-generation sequencing (3C-seq). The fundamental principles of 3C-seq briefly described in the following [@Soler:2010] , isolated cells are treated with a cross-linking agent to preserve in vivo nuclear proximity between DNA sequences. The DNA isolated from these cells is then digested using a primary restriction enzyme, typically a 6-base pairs cutting enzyme such as HindIII, EcoRI or BamHI. The digested product is then ligated under dilute conditions to favor intra-molecular over inter-molecular ligation events. This digested and ligated chromatin yields composite sequences representing (distal) genomic regions that are in close physical proximity in the cell nucleus. The digested and ligated chromatin is then de-crosslinked and subjected to a second restriction digest using either Nla III or Dpn II (a 4-cutter) as a secondary restriction enzyme to decrease the fragment sizes. The digested DNA is then ligated again under diluted conditions, creating small circular fragments. These fragments are inverse PCR-amplified using primers specific for a genomic region of interest (eg. promoter, enhancer, or any other element potentially involved in long-range interactions), termed the "viewpoint". The amplified fragments are then sequenced using massively parallel high-throughput sequencing. Because the 3C-seq procedure hybrid DNA molecules being a combination of viewpoint-specific primers followed by sequences dervied from the ligated interaction fragments. As such, these composite sequences are unmappable and need to be trimmed to removed the viewpoint sequences, thus leaving only the capture sequence fragments for mapping. After trimming, reads are mapped against a reference genome using alignment software such as Bowtie. A mapped read file generated by the mapping software is then transformed to the BAM file and analyzed using the r3Cseq package.
The required input is the BAM file, obtained as an output from the mapping software. The BAM file name must not contain any special symbols such as '*', '%', '$', '#', '!', '@', and "-". The "-" and "." must be replaced by using "_". The file name must be short and simple. For example, the original file name "%RIKO-R1-T5_S15_L001.trimmed_experiment.sorted.bam" must be changed to "RIKO_R1_T5_S15_L001.bam".
The represented identifier for a reference genome shown in each input BAM file is important to properly run r3Cseq. The represented identifier for each chromosome must be in "chr[1..19XYM]" format for the mouse reference genome and "chr[1..22XYM]" format for the human reference genome. Therefore, before using the package, a user has to check the identifier for the reference genome. If the identifier for each chromosome found in the mapped file is not in a proper format for example 'mm9_ref_chr01.fa', the Unix command like 'sed' might be used to replace 'mm9_ref_chr01.fa' to 'chr1'.
3C-seq data generated by [@Ralph:2012] will be used for the demonstration. The current version of r3Cseq supports mouse, human, and rat genomes. Therefore, the package requires one of the followings BSgenome packages to be installed; BSgenome.Mmusculus.UCSC.mm9.masked, BSgenome.Mmusculus.UCSC.mm10.masked, BSgenome.Hsapiens.UCSC.hg18.masked,BSgenome.Hsapiens.UCSC.hg19.masked, and BSgenome.Rnorvegicus.UCSC.rn5.masked.
Loading the r3Cseq package into R.
library(r3Cseq)
There are 2 example data sets found in the package.
data(Myb_prom_FL) data(Myb_prom_FB)
Myb_prom_FL, the 3C-seq data contains the aligned reads of the Myb promoter interactions signal in fetal liver. It was stored in the 'GRanges' object processed by the 'Rsamtools' package.
Myb_prom_FB, the 3C-seq data contains the aligned reads of Myb promoter interactions signal in fetal brain.
We will next perform r3Cseq to discover interaction regions, which possibly interact with the promoter region of Myb gene in both fetal liver and brain [@Ralph:2012].
In this section, we will analyze 3C-seq data, which were derived from fetal liver (high levels of the Myb gene expression) and fetal brain (expressing low level of the Myb gene). The latter will be used as a negative control. More examples of r3Cseq data analysis can be found on the r3Cseq website http://r3cseq.genereg.net. We firstly initialize the r3Cseq object.
my3Cseq.obj<-new("r3Cseq",organismName='mm9',isControlInvolved=TRUE, viewpoint_chromosome='chr10',viewpoint_primer_forward='TCTTTGTTTGATGGCATCTGTT', viewpoint_primer_reverse='AAAGGGGAGGAGAAGGAGGT',expLabel="Myb_prom_FL", contrLabel="MYb_prom_FB",restrictionEnzyme='HindIII')
The description of input parameters is described in the r3Cseq help page. We next add raw reads from "Myb_prom_FL" and "Myb_prom_FB" to the existing my3Cseq.obj.
expRawData(my3Cseq.obj)<-exp.GRanges contrRawData(my3Cseq.obj)<-contr.GRanges my3Cseq.obj
To get number of reads per restriction fragment, function getReadCountPerRestrictionFragment will be performed.
getReadCountPerRestrictionFragment(my3Cseq.obj)
The package also provides the function getReadCountPerWindow to count number of reads per non-overlapping window size defined by a user.
We next perform data normalization.
calculateRPM(my3Cseq.obj)
After normalization, the getInteractions function will be performed.
getInteractions(my3Cseq.obj,fdr=0.05)
To see the result of interaction regions, Two functions expInteractionRegions and contrInteractionRegions can be used to access the slot of r3Cseq object. To get the result of interaction regions for the experiment, expInteractionRegions will be performed.
fetal.liver.interactions<-expInteractionRegions(my3Cseq.obj) fetal.liver.interactions
To get the result of interaction regions for the control, contrInteractionRegions will be performed.
fetal.brain.interactions<-contrInteractionRegions(my3Cseq.obj) fetal.brain.interactions
To see the viewpoint information, getViewpoint function can be used. getViewpoint will return the GRanges object of the viewpoint information.
viewpoint<-getViewpoint(my3Cseq.obj) viewpoint
r3Cseq package provides visualization functions. These functions are plotOverviewInteractions, plotInteractionsNearViewpoint, plotInteractionsPerChromosome, and PlotDomainogramNearViewpoint.
plotOverviewInteractions function shows the overview of interaction regions distributed across genome.
plotOverviewInteractions(my3Cseq.obj)
plotInteractionsNearViewpoint function shows the zoom in of interaction regions located close to the viewpoint.
plotInteractionsNearViewpoint(my3Cseq.obj)
plotInteractionsPerChromosome function shows the interaction regions found in the chromosome10.
plotInteractionsPerChromosome(my3Cseq.obj,"chr10")
plotDomainogramNearViewpoint function shows the domainogram of interactions found in cis. This function may takes minutes to produce the domainogram plot.
plotDomainogramNearViewpoint(my3Cseq.obj,view="both")
getExpInteractionsInRefseq and getContrInteractionsInRefseq functions can be used to detect the list of genes that contain significant interaction signals in their proximity.
detected_genes<-getExpInteractionsInRefseq(my3Cseq.obj) head(detected_genes)
export3Cseq2bedGraph function exports all interactions from the GRanges object to the bedGraph format, which can be uploaded to the UCSC genome browser.
export3Cseq2bedGraph(my3Cseq.obj)
generate3CseqReport function generates the summary report from r3Cseq analysis results. The report contains a pdf file for all plots and text files of interaction regions. This function may takes minutes to produce the report.
generate3CseqReport(my3Cseq.obj)
The example of how to work with replicats is shown on http://r3cseq.genereg.net/ website.
http://r3cseq.genereg.net/ provides more details of r3Cseq analysis pipeline. The example data sets can be downloaded from the website.
Here is the output of sessionInfo on the system on which this document was compiled:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.