knitr::opts_chunk$set(tidy = FALSE, warning = FALSE, message = FALSE) CRANpkg <- function (pkg) { cran <- "https://CRAN.R-project.org/package" fmt <- "[%s](%s=%s)" sprintf(fmt, pkg, cran, pkg) } Biocpkg <- function (pkg) { sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) } Biocannopkg <- Biocpkg
library(BMplot) library(BSgenome.Athaliana.TAIR.TAIR9) library(BSgenome.Hsapiens.UCSC.hg19) library(ChIPseeker) library(DSS) library(ggplot2)
BMplot
is an R package for visualizing Base modification data. Base modification includes DNA methylation and RNA methylation. BMplot
provides functions to organize the information of base modification and visualize it.
Epigenetics plays an important role in Biology in the way of changing genetic information without changing nucleic acids. Histone modification and Methylation are two vital mechanisms in Epigenetics.
Many methods are used to detect the methylation in Epigenentics with the combination of sequencing and chemical methods. Me‑DIP
and oxBS‑seq
for 5mC
. TAB-seq
and hme‑DIP
for 5hmC
. 6mA‑DIP
and 6mA‑RE-seq
for 6mA
. These methods can be divided into two categories, including Antibody enrichment techniques
for base fragments and Single-base resolution techniques
for single base. BMplot is designed to visualize the Single-base resolution techniques
.
## loading package library(BMplot) library(BSgenome.Athaliana.TAIR.TAIR9) library(BSgenome.Hsapiens.UCSC.hg19) library(ChIPseeker) library(DSS) library(ggplot2)
BMplot
provides functions to achieve the goal of visualizing base modification. getBaseModificationDf()
can get the information of base modification and organize it into the form of data frame. plotBaseModificationProf()
use ggplot2 methods to visualize the base modification according to data frame created by getBaseModificationDf()
. bmData()
is a constructor for bmData object, which storing any base modification data including methylation value and read coverage value.
The input data contains three main parts. region
is the region information of base modification organized in the form of data frame. input
is the data structure which storing base modification data, including BSseq object
and bmData object
. BSgenome
is the sequence information provided by a set of BSgenome
package.
BSseq
is a class defined by r Biocpkg("bsseq")
[@Kasper_2012], representing whole-genome or capturing bisulfite sequencing data. BSseq
can be constructed by bsseq::BSseq()
. It can also be created by r Biocpkg("DSS")
[@Hao_Wu_2013;@Hao_Wu_2014;@Hao_Wu_2015;@Hao_Wu_2016], using DSS::makeBSseqData()
. These two constructors above are also exported in BMplot. Other constructors can also be used to construct BSseq object.
bmData object is similar to BSseq object, storing any base modification information, like IPD value[@flusberg2010direct] and so on, for the reason that BSseq object can only store information of methylation and read depth. There are three functions provided by BMplot
including one constructor and two functions enclosing the constructor.
bmData()
is the constructor of bmData object. Users can organize the data of interest into matrix-like object. For brief, users should extract each kind of value from the combined results. For example, users may get the combined results in the form of two matrix-like objects. Each object represent one sample and contain two value, like methylation value and depth value. User should first extract the methylation value from each sample and organize all the methylation value into one matrix-like object. Then do the same thing to depth value.
bmData <- function(value1 = NULL, value2 = NULL, pos = NULL, chr = NULL, gr = NULL, sampleNames = NULL, valueNames = NULL, ...)
For the sake of convenience, BMplot
provide two functions encloseing the constructor. Users can use makebmDataFromData()
to make bmData object from data. The data should be organized in one of the Granges
, GrangeList
, data.frame
and list
structure.
# simulated grange and grangelist object files <- getSampleFiles() gr1 <- readPeakFile(files[[4]]) mcols(gr1)[[1]] <- runif(length(gr1)) # make bmData from grange object dmData_grange <- makebmDataFromData(data = gr1, sampleNames = c("GSM1295076_CBX6_BF")) # make bmData from grangelist gr2 <- readPeakFile(files[[5]]) mcols(gr2)[[1]] <- runif(length(gr2)) gr_list <- GRangesList(gr1,gr2) dmData_grangelist <- makebmDataFromData(data = gr_list, sampleNames = c("GSM1295076_CBX6_BF", "GSM1295077_CBX7_BF")) # make bmData from data.frame df <- data.frame(chr=as.character(seqnames(gr1)),pos=start(gr1), value1= mcols(gr1)[[1]], value2= mcols(gr1)[[2]]) df_bmData <- makebmDataFromData(data = df, sampleNames = c("GSM1295076_CBX6_BF")) # make bmData from list df2 <- data.frame(chr=as.character(seqnames(gr2)),pos=start(gr2), value1= mcols(gr2)[[1]], value2= mcols(gr2)[[2]]) df_list <- list(df,df2) df_list_bmData <- makebmDataFromData(data = df_list, sampleNames = c("GSM1295076_CBX6_BF", "GSM1295077_CBX7_BF"))
Users can also use makebmDataFromFiles()
to make bmData object from file.
filefolder <- SampleFileFolder() # test folder with bed files bed_folder_bmData <- makebmDataFromFiles(name = filefolder[[1]], sampleNames = c("GSM1295076_CBX6_BF", "GSM1295077_CBX7_BF"), variablesNames = c("IPD_ratio","IPD")) bedfile <- SamepleBedFiles() # test bed file bed_bmData <- makebmDataFromFiles(name = bedfile[[1]], sampleNames = c("GSM1295076_CBX6_BF"), variablesNames = c("IPD_ratio","IPD")) # test folder with txt files txt_folder_bmData <- makebmDataFromFiles(name = filefolder[[2]], sampleNames = c("GSM1295076_CBX6_BF", "GSM1295077_CBX7_BF"), variablesNames = c("IPD_ratio","IPD")) txt_file <- SamepleTxtFiles() # test txt file txt_bmData <- makebmDataFromFiles(name = txt_file[[1]], sampleNames = c("GSM1295076_CBX6_BF"), variablesNames = c("IPD_ratio","IPD"))
The main part of the input data of plotBaseModificationProf()
is the data frame created by getBaseModificationDf()
.The output data of is a ggplot object.
DNA Methylation is one of the base modifications. It is well worthy to investigate the different methylation region(dmR) in DNA methylayion. We downloaded data from GEO(GSE52140)[@hascher2014dna] and organized the data in to Human_BSobj.rda
and Human_dmR.rda
. The procession script was put in ./data-raw/Human_data_procession.R
. We use the visualization of different methylation region in Homo sapiens to describe the work flow.
We use r Biocpkg("DSS")
[@Hao_Wu_2013;@Hao_Wu_2014;@Hao_Wu_2015;@Hao_Wu_2016] to detect dmR and create BSseq object.
We extracted the information from BSseq object and mapped to the sequence data stored in BSgenome.Hsapiens.UCSC.hg19
. Then we extracted the motif and strand information and organized it in the form of data frame. Cytosine was the interesting base to detect and the CG, CHH and CHG
were the motifs to investigate.
library(BSgenome.Hsapiens.UCSC.hg19) BSgenome_human <- BSgenome.Hsapiens.UCSC.hg19 # Human_dmR and Human_BSobj are internal sample data human_df <- getBaseModificationDf(region = Human_dmR[1,], BSgenome = BSgenome_human, input = Human_BSobj, base = "C", motif = c("CG","CHH","CHG")) head(human_df)
The data frame have six columns. coordinate
stands for the the coordinate of each base in the region. motif
stands for the motif of the methylation. strand
stands for the strand information of the base modification. sample
stands for the sample name of the base modification sample. value
stands for the coverage depth of each base or the methylation value of each base. type
decides the category should be depth value or methylation value.
plotBaseModificationProf(human_df,switch_y_value = T)
We downloaded data from GEO(GSE62206) and organized the data in to A_thaliana_BSobj.rda
and A_thaliana_dmR.rda
. The procession script was put in ./data-raw/A_thaliana_data_procession.R
.
We use the same procession above to get the base modification of A_thaliana. The only difference is that we can get multiple motifs rather than just one motif above.
library(BSgenome.Athaliana.TAIR.TAIR9) BSgenome_thaliana <- BSgenome.Athaliana.TAIR.TAIR9 # A_thaliana_dmR and A_thaliana_BSobj are internal sample data thaliana_df <- getBaseModificationDf(region = A_thaliana_dmR, BSgenome = BSgenome_thaliana, input = A_thaliana_BSobj, base = "C", motif = c("CG","CHG","CHH")) head(thaliana_df)
plotBaseModificationProf(thaliana_df,switch_y_value = T)
The colors we use to visualize motif are the default colors in r CRANpkg("ggplot2")
[@Hadley_2016]. Users can specify the colors of interest through motif_color
. The order of color maps the order of the motifs on the picture.
plotBaseModificationProf(thaliana_df, motif_color = c("#993366","#3300CC","#CC6600"), switch_y_value = T)
BMplot
will put the value on y axis by default. For example, depth value was put on the left axis and methylation value was put on the right axis when ploting the methylation profile. Users can switch the depth value from left to right and methylation value from right to left by setting the switch_y_value
to TRUE.
Normal motifs, like CG
and CHG
, have the base in the first position. However, we would visualize others special motifs in the research of Epigenetics. We would investigate adenine in GAGG
, which having the interesting base in the second position. position_bias
comes to solve this problem.
BMplot provide simulated data to illustrate this parameter. We simulated the base modification data of adenine and put the procession in the ./data-raw/simulated_A_thaliana_data_procession.R
.
# simulated_dmR and simulated_BSobj are internal sample data df_simulate <- getBaseModificationDf(region = simulated_dmR, input = simulated_BSobj, BSgenome = BSgenome_thaliana, motif = c("GAGG","ANYGA"), position_bias = c(2,5), base = "A") plotBaseModificationProf(df_simulate)
The picture above visualized the interesting motif,
ANYGA
. Users can visualize any motifs of interests through position_bias
.
BMplot supports to visualize a list of dmRs.
df_list <- getBaseModificationDf(region = Human_dmR[1:2,], BSgenome = BSgenome_human, input = Human_BSobj, base = "C", motif = c("CG","CHH","CHG")) ## nrow control the layout of pictures plotBaseModificationProf(df_list,nrow = 1,switch_y_value = T)
There are many kinds of base modifications. For example, the 5mC
and 5hmC
in cytosine modification and the 6mA
in adenine modification. Hence, there is a potential need to visualize two kinds of modifications in one picture. Functions in BMplot meet this need.
We use two sets of data focusing in 5mC
and assume that one is data for 5mC
and the other is for 5hmC
. The method to visualize two kinds of base modification is to assign different titles to different pictures.
df_list1 <- getBaseModificationDf(region = Human_dmR[1:2,], BSgenome = BSgenome_human, input = Human_BSobj, base = "C", motif = c("CG","CHH","CHG")) plotBaseModificationProf(df_list1, title = c("5mC Methylation","5hmC Methylation"), nrow = 1, switch_y_value = T)
Base modification can produce many information including not only depth value and methylation value but also IPD value. IPD and IPD ratio are important indicators in single-molecule real-time sequencing. We can distinguish modified bases with it[@flusberg2010direct]. Visualizing IPD and IPD ratio in different motifs is a worthy attempt.
The following data is simulated from GSE62206 by assigning random IPD value and IPD ratio value to each base in GSE62206. Since the BSseq
object can only store data of methylation(X) and read depth(N), we create a bmData
object to hold any information of base modification.
bmData_A_thaliana <- makebmDataFromData(data = simulated_IPD, sampleNames = c("GSM1522201_WT", "GSM1522201_ddm1")) bmData_df <- getBaseModificationDf(region = A_thaliana_dmR[1,], input = bmData_A_thaliana, BSgenome = BSgenome.Athaliana.TAIR.TAIR9::Athaliana, motif = c("CG","CHH","CHG"), base = "C") plotBaseModificationProf(bmData_df,switch_y_value = T, ylab = "IPD_ratio(%)",second_ylab = "IPD(s)")
Users can use the function of highlight to emphasize the region with the help of BMplot
.
plotBaseModificationProf(thaliana_df,switch_y_value = T, highlight_lim = c(5651586,5651913))
This highlight function can also be applied to list regions.
BSgenome_human <- BSgenome.Hsapiens.UCSC.hg19 df_list <- getBaseModificationDf(region = Human_dmR[1:2,], BSgenome = BSgenome_human, input = Human_BSobj, base = "C", motif = c("CG","CHH","CHG")) plotBaseModificationProf(df_list,nrow = 1,switch_y_value = T, highlight_lim = list(c(1100262,1100463), c(1173154,1173300)))
After plotting the profile of base modification, users can use annotation database to annotate the region of base modification in the form of gene track. Users can input annotation database to GeneModel
parameter, and then BMplot
will use ggbio
[@Tengfei_ggbio] to plot the gene track according to GeneModel
parameter. GeneModel
can be OrganismDb object, TxDb object, EnsDb object or GrangeList object.
BSgenome_human <- BSgenome.Hsapiens.UCSC.hg19 txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene region <- Human_dmR[1,] region <- region[,c(1:3)] region[1,c(2,3)] <- c(10000,25000) human_df <- getBaseModificationDf(region = region, BSgenome = BSgenome_human, input = Human_BSobj, base = "C", motif = c("CG","CHH","CHG")) plotBaseModificationProf(human_df, switch_y_value = T, GeneModel = txdb, highlight_lim = c(10497,11000))
The motif
parameter has its own order. getBaseModificationDf()
will do the annotation in the order of motif
.
df1 <- getBaseModificationDf(region = Human_dmR[1,], BSgenome = BSgenome_human, input = Human_BSobj, base = "C", motif = c("CN","CH"))
If the motifs are specified above, the CH(H->A/T/C)
will cover some parts of CN(N->A/T/G/C)
. The result will have two parts, CH
and CN
, which is out of expectation for the reason that N
contains H
. In a word, Users should be aware of the order of the motif
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.