find_potential_duplicates: Identify potential duplicates in a list of references

View source: R/find_potential_duplicates.R

find_potential_duplicatesR Documentation

Identify potential duplicates in a list of references

Description

Function computing distances between titles and selecting those which are too close as defined by a maximum distance.

Usage

find_potential_duplicates(x, distmethod = "qgram", maxdist = 10)

Arguments

x

Tibble. Table with keys and titles.

distmethod

Character. Method to compute distances between titles. Can be: osa, lv, dl, hammig, lcs, qgram, cosine, jaccard, jw, or soundex.

maxdist

Numeric. Threshold to apply. Only titles the distance between which is smaller or equal to this number will be returned for check.

Value

A numeric vector with the row numbers of the potential duplicates

Author(s)

Nicolas Mangin


NicolasJBM/bibliogr documentation built on June 1, 2024, 4:27 p.m.