grooMethy: Groom methylation data to fix potential data issues

View source: R/grooMethy.R

grooMethyR Documentation

Groom methylation data to fix potential data issues

Description

grooMethy is used to automatically detect and fix data issues including zero beta value, missing value, and infinite value.

Usage

grooMethy(
  methyDat,
  Seq.GR = NULL,
  impute = TRUE,
  imputebyrow = TRUE,
  mapGenome = FALSE,
  verbose = FALSE
)

Arguments

methyDat

A RatioSet, GenomicRatioSet, DataFrame, data.table, data.frame, or matrix of Illumina BeadChip methylation data (450k or EPIC array) or Illumina methylation percentage estimates by sequencing. If the data are prepared as a data.frame or alike format, for Illumina array data, please make sure there is a column or row names are available to indicate the Illumina probe names (i.e. cg00000029); for sequencing methylation data, please provide the corresponding CpG location information in Seq.GR.

Seq.GR

A GRanges object containing genomic locations of the CpGs profiled by sequencing platforms. This parameter should not be NULL if the input methylation data methyDat are obtained by sequencing platform. The order of Seq.GR should match the order of methyDat. Note that the genomic location can be in either hg19 or hg38 build. See details.

impute

If TRUE, K-Nearest Neighbouring imputation will be applied to fill the missing values. Default = TRUE. See Details.

imputebyrow

If TRUE, missing values will be imputed using similar values in row (i.e., across samples); if FALSE, missing values will be imputed using similar values in column (i.e., across CpGs). Default is TRUE.

mapGenome

Logical parameter. If TRUE, function will return a GenomicRatioSet object instead of a RatioSet. This function is not applicable for sequencing data.

verbose

Logical parameter. Should the function be verbose?

Details

For methylation data in beta value, if zero/one value exists, the logit transformation from beta to M value will produce infinite value. Therefore, zero/one beta value will be replaced with the smallest non-zero beta/largest non-one beta value found in the dataset. grooMethy can also handle missing value (i.e. NA or NaN) using KNN imputation (see impute.knn). The infinite value will be also treated as missing value for imputation. If the original dataset is in beta value, grooMethy will first transform it to M value before imputation is carried out. If the imputed value is out of the original range (which is possible when imputebyrow = FALSE), mean value will be used instead. Warning: imputed values for multimodal distributed CpGs (across samples) may not be correct. Please check package ENmix to identify the CpGs with multimodal distribution. Please note that grooMethy is also embedded in remp so the user can run remp directly without explicitly running grooMethy. For sequencing methylation data, please specify the genomic location of CpGs in a GenomicRanges object and specify it in Seq.GR. For an example of Seq.GR, Please run minfi::getLocations(IlluminaHumanMethylation450kanno.ilmn12.hg19) (the row names of the CpGs in Seq.GR can be NULL). The user should make sure the genome build of Seq.GR match the build specified in genome parameter of function initREMP and remprofile (default is "hg19").

Value

A RatioSet or GenomicRatioSet containing beta value and M value of the methylation data.

Examples

# Get GM12878 methylation data (450k array)
if (!exists("GM12878_450k")) GM12878_450k <- getGM12878("450k")
GM12878_450k <- grooMethy(GM12878_450k, verbose = TRUE)

# Also works if data input is a matrix
grooMethy(minfi::getBeta(GM12878_450k), verbose = TRUE)

YinanZheng/REMP documentation built on May 14, 2022, 5:58 p.m.