splitDataByDensity: Split methylation data into regions based on the density of...

View source: R/utils.R

splitDataByDensityR Documentation

Split methylation data into regions based on the density of CpGs

Description

This function splits the methylation data into regions based on the density of CpGs.

Usage

splitDataByDensity(
  dat,
  window.size = 100,
  by = 1,
  min.density = 5,
  gap = 10,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

Arguments

dat

a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of Meth_Counts (methylated counts), Total_Counts (read depths), Position (Genomic position for the CpG site) and ID (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.

window.size

this positive integer defines the size of the sliding window in bp. Decimal values will be rounded to the nearest integer. The value should be greater than 10. The default value is 100 (100 bp)

by

positive integer defines by how many base pairs the window moves at each increment. Decimal values will be rounded to the nearest integer. The default value is 1 (1 bp).

min.density

positive integer defines the minimum density threshold for each window. Decimal values will be rounded to the nearest integer. The default value is 5 (5 CpGs/window.size).

gap

positive integer defining the gap width beyond which we consider that two regions are independent. Decimal values will be rounded to the nearest integer. The default value is 10 (10bp).

min.cpgs

positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.

max.cpgs

positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.

verbose

logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

A named list of data.frame containing the data of each independent region.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByDensity(dat = RAdat.f, window.size = 100, by = 1, 
min.density = 5, gap = 10, min.cpgs = 50, verbose = FALSE)

kaiqiong/SOMNiBUS documentation built on Feb. 24, 2023, 5:38 a.m.