methSeg: Segment methylation or differential methylation profile

View source: R/methSeg.R

methSegR Documentation

Segment methylation or differential methylation profile

Description

The function uses a segmentation algorithm (fastseg) to segment the methylation profiles. Following that, it uses gaussian mixture modelling to cluster the segments into k components. This process uses mean methylation value of each segment in the modeling phase. Each component ideally indicates quantitative classification of segments, such as high or low methylated regions.

Usage

methSeg(
  obj,
  diagnostic.plot = TRUE,
  join.neighbours = FALSE,
  initialize.on.subset = 1,
  ...
)

Arguments

obj

GRanges, methylDiff, methylDiffDB, methylRaw or methylRawDB . If the object is a GRanges it should have one meta column with methylation scores and has to be sorted by position, i.e. ignoring the strand information.

diagnostic.plot

if TRUE a diagnostic plot is plotted. The plot shows methylation and length statistics per segment group. In addition, it shows diagnostics from mixture modeling: the density function estimated and BIC criterion used to decide the optimum number of components in mixture modeling.

join.neighbours

if TRUE neighbouring segments that cluster to the same seg.group are joined by extending the ranges, summing up num.marks and averaging over seg.means.

initialize.on.subset

a numeric value indicating either percentage or absolute value of regions to subsample from segments before performing the mixture modeling. The value can be either between 0 and 1, e.g. 0.1 means that 10 integer higher than 1 to define the number of regions to sample. By default uses the whole dataset, which can be time consuming on large datasets. (Default: 1)

...

arguments to fastseg function in fastseg package, or to densityMclust in Mclust package, could be used to fine tune the segmentation algorithm. E.g. Increasing "alpha" will give more segments. Increasing "cyberWeight" will give also more segments."maxInt" controls the segment extension around a breakpoint. "minSeg" controls the minimum segment length. "G" argument denotes number of components used in BIC selection in mixture modeling. For more details see fastseg and Mclust documentation.

Details

To be sure that the algorithm will work on your data, the object should have at least 5000 records

After initial segmentation with fastseg(), segments are clustered into self-similar groups based on their mean methylation profile using mixture modeling. Mixture modeling estimates the initial parameters of the distributions by using the whole dataset. If "initialize.on.subset" argument set as described, the function will use a subset of the data to initialize the parameters of the mixture modeling prior to the Expectation-maximization (EM) algorithm used by Mclust package.

Value

A GRanges object with segment classification and information. 'seg.mean' column shows the mean methylation per segment. 'seg.group' column shows the segment groups obtained by mixture modeling

Author(s)

Altuna Akalin, contributions by Arsene Wabo and Katarzyna Wreczycka

See Also

methSeg2bed, joinSegmentNeighbours

Examples



 download.file(
 "https://raw.githubusercontent.com/BIMSBbioinfo/compgen2018/master/day3_diffMeth/data/H1.chr21.chr22.rds",
 destfile="H1.chr21.chr22.rds",method="curl")

 mbw=readRDS("H1.chr21.chr22.rds")

 # it finds the optimal number of componets as 6
 res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10)

 # however the BIC stabilizes after 4, we can also try 4 componets
 res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10,G=1:4)

 # get segments to BED file
 methSeg2bed(res,filename="H1.chr21.chr22.trial.seg.bed")


unlink(list.files(pattern="H1.chr21.chr22",full.names=TRUE))


al2na/methylKit documentation built on Nov. 30, 2024, 5:44 p.m.