UniD.probefilter: Probe and sample filter

Description Usage Arguments Value References Examples

View source: R/UniD.probefilter.R

Description

Used to filter out probes with suspect quality issues for following categories: (1) probes located on ChrX/Y; (2) probes may affected by SNP; (3) probes may mapped to multiple locations; (4) probes not targeted CpG sites; (5) probes on 450k platform but not available on EPIC platform; (6) probes with large proportion of missing values. It can also filter out samples with large proportion of missing values. Missing values may caused by non-significant detection p-value or less beadcount per probe.

Usage

1
2
3
4
5
UniD.probefilter(Beta.raw, outDir, filterXY = TRUE, filterSNPHit = TRUE,
  filterMultiHit = TRUE, filterNonCG = FALSE, filterNonEpic = TRUE,
  arrayType = c("450k", "EPIC"), filterSample = TRUE,
  filterSample.cut = 0.1, filterProbe = FALSE, filterProbe.cut = 0.05,
  write)

Arguments

Beta.raw

data frame generated from the UniD.dataqc()

outDir

directory where output data should be saved if write = T

filterXY

whether filter out probes located on Chromosome X and Y. Default value is TRUE.

filterSNPHit

whether filter out probes may affected by SNP. Default is TRUE. Probe list adapted from Zhou W, Nucleic Acids Research, 2017.

filterMultiHit

whether filter out probes can mapped to multi locations. Default is TRUE. Probe list adapted from Nordlund J, Genome Biology, 2013.

filterNonCG

whether filter out probes which are not targeting CpG sites. Default is FALSE.

filterNonEpic

whether filter outprobes which are available on 450k platform but not available on EPIC platform. Default is T. Highly recommended if building models with data from 450k platform.

arrayType

the platform which raw data generated, can be "450k" or "EPIC"

filterSample

whether filter out samples with high proportion of missing values. Default is TRUE.

filterSample.cut

if filterSample = T, the threshold for high proportion of missing values per sample. Default is 0.1.

filterProbe

whether filter out probes with high proportion of missing values. Default is TRUE. Carefully usage with small number of sample size.

filterProbe.cut

If filterProbe = T, the threshold for high proporition of missing values per probe. Default is 0.05.

write

whether the output should be saved, highly recommended

Value

A data frame with probes after filtering in Beta value filtering

References

Zhou, W., et al. (2017). "Comprehensive characterization, annotation and innovative use of Infinium DNA #' methylation BeadChip probes." Nucleic Acids Res 45(4): e22.

Nordlund, J., et al. (2013). "Genome-wide signatures of differential DNA methylation in pediatric acute #' lymphoblastic leukemia." Genome Biol 14(9): r105.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
Beta.clean <- UniD.probefilter(Beta.raw, outDir = "~/Desktop/output/",
filterXY = T, filterSNPHit = T, filterMultiHit = T, filterNonCG = F,
filterNonEpic = T, arrayType = "450k", filterSample = T, filterSample.cut
= 0.1, filterProbe = F, write = T)

Beta.clean <- UniD.probefilter(Beta.raw, outDir = NULL,
filterXY = T, filterSNPHit = T, filterMultiHit = T, filterNonCG = F,
filterNonEpic = F, arrayType = "EPIC", filterSample = T, filterSample.cut
= 0.1, filterProbe = F, write = F)

## End(Not run)

JieYang031/UniD documentation built on May 5, 2021, 5:16 p.m.