DupChecker: a package for checking high-throughput genomic data redundancy in meta-analysis

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates would make study results questionable. We developed a Bioconductor package DupChecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data.

Install the latest version of this package by entering the following in R:
source("https://bioconductor.org/biocLite.R")
biocLite("DupChecker")
AuthorQuanhu Sheng, Yu Shyr, Xi Chen
Bioconductor views Preprocessing
Date of publicationNone
Maintainer"Quanhu SHENG" <shengqh@gmail.com>
LicenseGPL (>= 2)
Version1.12.0

View on Bioconductor

Files

DESCRIPTION
NAMESPACE
R
R/DupChecker.R
README.md
build
build/vignette.rds
inst
inst/CITATION
inst/doc
inst/doc/DupChecker.R
inst/doc/DupChecker.Rnw
inst/doc/DupChecker.pdf
man
man/arrayExpressDownload.Rd man/buildFileTable.Rd man/geoDownload.Rd man/validateFile.Rd
vignettes
vignettes/DupChecker.Rnw

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.