splitDataByBed: Split methylation data into regions based on the genomic...
In kaiqiong/SOMNiBUS: Smooth modeling of bisulfite sequencing

splitDataByBed

R Documentation

Split methylation data into regions based on the genomic annotations

This function splits the methylation data into regions based on the genomic annotation provided under the form of a 1-based BED file

splitDataByBed(
  dat,
  chr,
  bed,
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID` (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`chr`	character vector containing the chromosome information. Its length should be equal to the number of rows in `dat`.
`bed`	character, path to the 1-based BED file containing the annotations
`gap`	integer defining the maximum gap that is allowed between two regions to be considered as overlapping. According to the `GenomicRanges::findOverlaps` function, the gap between 2 ranges is the number of positions that separate them. The gap between 2 adjacent ranges is 0. By convention when one range has its start or end strictly inside the other (i.e. non-disjoint ranges), the gap is considered to be -1. Decimal values will be rounded to the nearest integer. The default value is `-1` (meaning strict overlapping).
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.