Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates would make study results questionable. We developed a Bioconductor package DupChecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data.
|Author||Quanhu Sheng, Yu Shyr, Xi Chen|
|Date of publication||None|
|Maintainer||"Quanhu SHENG" <email@example.com>|
|License||GPL (>= 2)|