Description Usage Arguments Value Examples
For a tabular set of publication records, identifies potential sets of duplicate entries and labels them with a unique identifier.
1 2 | dupes_find(x, match_cols, approx_match = FALSE, string_dist = 5,
min_length = 10, simplify_match = TRUE)
|
x |
The dataset in which duplicate entries will be identified |
match_cols |
Column(s) that will be used to search for duplicate records |
approx_match |
Whether to perform a duplicate search using string distances or exact values |
string_dist |
When using approximate matching, the string distance cutoff at which records will be assumed duplicated |
min_length |
The minimum length for the combined matching string
produced by |
simplify_match |
Whether to perform duplicate searches after removing
all non alpha-numeric characters from the reference string generated from
|
An updated version of x
, with one column specifying the
final string used to search for duplicates (matching_col
)
and another column containing unique identifiers for each set of
duplicates (match_ID
).
1 2 3 4 5 6 7 | ## Not run:
test <- rbind(form_mm_recs, form_mm_recs)
test <- dupes_find(test, c(1, 3))
dupes <- dupes_return(test)
out <- dupes_rm(test)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.