qcfilter: Filtering out low quality and outlier data

View source: R/qcfilter.R

qcfilterR Documentation

Filtering out low quality and outlier data

Description

Outlier was defined as values smaller than 3 times IQR from the lower quartile or greater than 3 times IQR from the upper quartile. If data quality information were provided, low quality data points will be set as missing data first before looking for outliers. All outliers and low quality data will be set as miss in output matrix. If set imput=TRUE, imputation will be performed using k-nearest neighbors method to impute all missing values.

Usage

qcfilter(mat,qcscore=NULL,rmoutlier=TRUE,byrow=TRUE,detPthre=0.000001,nbthre=3,
     rmcr=FALSE,rthre=0.05,cthre=0.05,impute=FALSE,imputebyrow=TRUE,fastimpute=FALSE,...)

Arguments

mat

An numeric matirx containing methylation beta values

qcscore

If the data quality infomation (the output from function QCinfo) were provied, low quality data points as defined by detection p value threshold (detPthre) and number of bead threshold (nbthre) will be set as missing value.

rmoutlier

if TRUE, outliers data points will be set as missing data NA.

byrow

TRUE: Looking for outliers row by row, or FALSE: column by column.

detPthre

Detection P value threshold to define low qualitye data points, detPthre=0.000001 in default.

nbthre

Number of beads threshold to define low qualitye data points, nbthre=3 in default.

rmcr

TRUE: exclude rows and columns with too many missing values as defined by rthre and cthre. FALSE is in default

rthre

Minimum of percentage of missing values for a row to be excluded

cthre

Minimum of percentage of missing values for a column to be excluded

impute

If TRUE, k-nearest neighbors methods will used for imputation.

imputebyrow

TRUE: impute missing values using similar values in row, or FALSE: in column

fastimpute

If TRUE, probe median will be used for fast imputation.

...

Arguments to be passed to the function impute.knn in R package "impute"

Value

The output is an numeric matrix.

Author(s)

Zongli Xu

References

Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.

Examples



if (require(minfiData)) {
path <- file.path(find.package("minfiData"),"extdata")
rgSet <- readidat(path = path,recursive = TRUE)
qc=QCinfo(rgSet)
mdat=preprocessENmix(rgSet,QCinfo=qc,nCores=6)
mdat=norm.quantile(mdat,method="quantile1")
beta=rcp(mdat)
#filter out outliers data points only
b1=qcfilter(beta)
#filter out low quality and outlier data points
b2=qcfilter(beta,qcscore=qc)
#filter out low quality and outlier values, remove rows and columns with
# too many missing values
b3=qcfilter(beta,qcscore=qc,rmcr=TRUE)
#filter out low quality and outlier values, remove rows and columns with
# too many missing values, and then do imputation
b3=qcfilter(beta,qcscore=qc,rmcr=TRUE,impute=TRUE)
}

xuz1/ENmix documentation built on Nov. 24, 2024, 4:31 a.m.