README.md

seqlm

An R package for identification of differentially methylated regions (DMRs) from high density chip, for example Illumina 450K, data.

Citation

R Kolde*, K Märtens*, K Lokk, S Laur, J Vilo 2016. “Seqlm: an MDL Based Method for Identifying Differentially Methylated Regions in High Density Methylation Array Data.” Bioinformatics (Oxford, England): btw304.

Installation

The most convenient way to install the package is by using the devtools package.

library(devtools)
install_github("raivokolde/seqlm")

To start using the package just load it as any other package.

library(seqlm)

Usage

For running the algorithm one has to have three objects:

All the work is done by one command seqlm that takes as input all the objects described above and also parameters:

For more information use ?seqlm in R.

Optionally one can visualise the results using seqlmreport command that visualises the identified regions and creates a html page to show them. The main input for this function is the result of seqlm and the three objects described above. There are additional options to control the appearance of the plots, see ?seqlmreport for details.

Example

An example dataset tissue_small is included in the package. It contains data comparing adipose tissue and brainstem, from chromosomes 17 and 18. Finding regions in this data can be accomplished using commands:

data(tissue_small)
segments = seqlm(values = tissue_small$values, genome_information = tissue_small$genome_information, annotation =  tissue_small$annotation)

The result of the analysis is a GRanges object containing the locations of the regions and associated statistics.

The analysis can be time consuming, if the whole genome is analysed at once. If the computer has multicore capabilities it is easy to parallelize the calculations. We use the foreach framework by Revolution Computing for parallelization. To enable the parallelization one has to register the parallel backend before and this will be used by seqlm. Ideally, the next commands should take roughly half the time compared to the previous.

library(doParallel)
registerDoParallel(cores = 2)
segments = seqlm(values = tissue_small$values, genome_information = tissue_small$genome_information, annotation =  tissue_small$annotation)

To visualise the results it is possible to plot the most imortant sites and generate a HTML report

temp = tempdir()
seqlmreport(segments[1:10], tissue_small$values, tissue_small$genome_information, tissue_small$annotation, dir = temp)

Here is an example of the resulting file.

Method

The method is described in a poster, that was presented at the Epigenetics of Common Diseases conference. Briefly, the seqlm method works in three stages.

Stage 1: The genome is divided into smaller pieces based on a genomic distance cutoff.

Stage 2: In each piece probes are segmented into regions that have approximately constant difference between the groups of interest. Example of the segmentation and its process is shown in schema.

Stage 3: We assess the relevance of each segment, by using a mixed model where the classes are a fixed effect and a sample is a random effect. This model takes into account the repeated nature of the consecutive methylation measurements. The segments are ordered by their significance.

Example of seqlm segmentation



raivokolde/seqlm documentation built on May 26, 2019, 9:59 p.m.