DNA methylation studies have increased in number over the past decade thanks
to the recent advances in next-generation sequencing (NGS) and microarray
technology (MA), providing many data sets at high resolution, enabling
researchers to understand methylation patterns and their regulatory roles in
biological processes and diseases.
Notwithstanding that diverse methods and software have created ample opportunities for researchers to do quantitative analysis, they make it difficult for practitioners to choose the one that is suitable and efficient in analyzing DNA methylation data. Having examined most of differentially methylation identification tools for bisulfite sequencing (BS-Seq) data, we observed several drawbacks in the existing analytic tools. To address these issues we have developed a novel differentially methylated CpG site identification tool which is based on Hidden Markov models (HMM) called
DMCHMM. This vignette provides some guidelines on
how to use the package and analyze BS-Seq data.
Following topics will be discussed in this vignette:
Two different classes are defined by extending the
BSData-class is designed to hold BS-Seq
cBSData-method is defined to create a
This class includes two slots:
methReads, a matrix with columns representing samples and rows
representing genomic positions (CpG sites) and elements of matrix representing
methylation counts at each position in each sample;
totalReads, a matrix with similar columns and rows except the elements
representing total number of reads.
For reading raw BS-Seq data we adopted The
readBismark function from
readBismark-method reads samples stored in different files with
six columns of chromosome, start position, end position,
methylation percentage, number of Cs and number of Ts.
Three data files are included in the
DMCHMM package for illustration.
The data can be imported using following code.
library(DMCHMM) fn <- list.files(system.file("extdata",package = "DMCHMM")) fn.f <- list.files(system.file("extdata",package="DMCHMM"), full.names=TRUE) OBJ <- readBismark(fn.f, fn, mc.cores = 2) cdOBJ <- DataFrame(Cell = factor(c("BC", "TC","Mono"), labels = c("BC", "TC", "Mono")), row.names = c("BCU1568","BCU173","BCU551")) colData(OBJ) <- cdOBJ OBJ
The above data set only include one sample for each cell type. We need more samples to be able to compare their methylations and find DMCs. For illustration we generate a sample of BS-Seq data as follows.
nr <- 150; nc <- 8 metht <- matrix(as.integer(runif(nr * nc, 0, 20)), nr) methc <- matrix(rbinom(n=nr*nc,c(metht),prob = runif(nr*nc)),nr,nc) r1 <- GRanges(rep("chr1", nr), IRanges(1:nr, width=1), strand="*") names(r1) <- 1:nr cd1 <- DataFrame(Group=rep(c("G1","G2"),each=nc/2),row.names=LETTERS[1:nc]) OBJ1 <- cBSData(rowRanges=r1,methReads=methc,totalReads=metht,colData=cd1) OBJ1
There are two approaches to smoothed the data before testing for DMCs. Either EM
or MCMC can be used to predict methylation levels utilizing HMM. The
methHMEM-method which is developed to predict methylation levels. The output
BSDMCs-class that can be either used to find DMCs or use MCMC algorithm
to re-smooth the raw data. The process is as follows.
OBJ2 <- methHMEM(OBJ1, MaxK=2) OBJ2
Although EM algorithm is a fast way to smooth the data but the results are not
as good as the MCMC algorithm. The MCMC algorithm, however, is slow. In order to
increase the speed, we first use
methHMEM-method to find the HMM order for
each sample and then we use
methHMCMC-method to predict methylation levels.
The procedure is as follows.
OBJ3 <- methHMMCMC(OBJ2) OBJ3
Having smoothed the data using HMM, we run linear between predicted methylation
levels and grouping covariate. In case other covariates exist, one can use the
formula argument to specify a linear model. When there is no covariates no
action is required. The following command identifys the DMCs. The results are
stored in a
BSDMCs-class and can be retrived by calling
OBJ4 <- findDMCs(OBJ3) head(metadata(OBJ4)$DMCHMM)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.