DupChecker: a package for checking high-throughput genomic data redundancy in meta-analysis

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates would make study results questionable. We developed a Bioconductor package DupChecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data.

Author
Quanhu Sheng, Yu Shyr, Xi Chen
Date of publication
None
Maintainer
"Quanhu SHENG" <shengqh@gmail.com>
License
GPL (>= 2)
Version
1.12.0

View on Bioconductor

Man pages

arrayExpressDownload
arrayExpressDownload
buildFileTable
buildFileTable
geoDownload
geoDownload
validateFile
validateFile

Files in this package

DupChecker/DESCRIPTION
DupChecker/NAMESPACE
DupChecker/R
DupChecker/R/DupChecker.R
DupChecker/README.md
DupChecker/build
DupChecker/build/vignette.rds
DupChecker/inst
DupChecker/inst/CITATION
DupChecker/inst/doc
DupChecker/inst/doc/DupChecker.R
DupChecker/inst/doc/DupChecker.Rnw
DupChecker/inst/doc/DupChecker.pdf
DupChecker/man
DupChecker/man/arrayExpressDownload.Rd
DupChecker/man/buildFileTable.Rd
DupChecker/man/geoDownload.Rd
DupChecker/man/validateFile.Rd
DupChecker/vignettes
DupChecker/vignettes/DupChecker.Rnw