DupChecker: a package for checking high-throughput genomic data redundancy in meta-analysis

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates would make study results questionable. We developed a Bioconductor package DupChecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data.

Package details

AuthorQuanhu Sheng, Yu Shyr, Xi Chen
Bioconductor views Preprocessing
Maintainer"Quanhu SHENG" <shengqh@gmail.com>
LicenseGPL (>= 2)
Version1.25.0
Package repositoryView on Bioconductor
Installation Install the latest version of this package by entering the following in R:
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DupChecker")

Try the DupChecker package in your browser

Any scripts or data that you put into this service are public.

DupChecker documentation built on April 28, 2020, 6:46 p.m.