Description Usage Arguments Value
View source: R/deduplication_functions.R
Given a data frame and a field to check for duplicates, flags and removes duplicate entries with three optional methods.
1 2 | deduplicate(df, field, method = c("quick", "similarity", "fuzzy"),
language = "English", cutoff_distance = 2)
|
df |
the data frame to deduplicate |
field |
the name or index of the column to check for duplicate values |
method |
the manner of duplicate detection; quick removes exact text duplicates, similarity removes duplicates below a similarity threshold, and fuzzy uses fuzzdist matching |
language |
the language to use if method is set to similarity |
cutoff_distance |
the threshold below which articles are marked as duplicates by the similarity method |
a deduplicated data frame
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.