PGRdup: Discover Probable Duplicates in Plant Genetic Resources Collections

Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.

Install the latest version of this package by entering the following in R:
AuthorJ. Aravind [aut, cre], J. Radhamani [aut], Kalyani Srinivasan [aut], B. Ananda Subhash [aut], R. K. Tyagi [aut]
Date of publication2017-03-15 09:57:09
MaintainerJ. Aravind <>
LicenseGPL-2 | GPL-3

View on CRAN


AddProbDup Man page
DataClean Man page
DisProbDup Man page
DoubleMetaphone Man page
GN1000 Man page
KWCounts Man page
KWIC Man page
MergeKW Man page
MergePrefix Man page
MergeProbDup Man page
MergeSuffix Man page
ParseProbDup Man page
PGRdup Man page
PGRdup-package Man page
print.KWIC Man page
print.ProbDup Man page
ProbDup Man page
read.genesys Man page
ReconstructProbDup Man page
ReviewProbDup Man page
SplitProbDup Man page
ValidatePrimKey Man page
ViewProbDup Man page

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.