dupes_find_1way: Identify previously documented publication records

Description Usage Arguments Value Examples

Description

For two sets of publication records, one old and one new, combines old and new records and marks instances where new records are duplicated in the old set of records.

Usage

1
2
dupes_find_1way(old, new, match_cols, approx_match = FALSE, string_dist = 5,
  min_length = 10, simplify_match = TRUE)

Arguments

old

The previous set of publication records

new

The new set of publication records

match_cols

Column(s) that will be used to search for duplicate records

approx_match

Whether to perform a duplicate search using string distances or exact values

string_dist

When using approximate matching, the string distance cutoff at which records will be assumed duplicated

min_length

The minimum string length for match_cols at which a record will be considered when searching for duplicates

simplify_match

Whether to perform duplicate searches after removing all non alpha-numeric characters from the reference string generated from match_cols

Value

The combined records from old and new records from new found in old, indicated by paired match_IDs. duplicates (match_ID).

Examples

1
2
3
4
5
6
7
8
## Not run: 
old <- form_mm_recs[1:10, ]
new <- form_mm_recs[8:12, ]
test <- dupes_find_1way(old, new, c(1, 3))
dupes <- dupes_return(test)
out <- dupes_rm_1way(test)

## End(Not run)

graggsd/sysreviewR documentation built on May 16, 2019, 2:52 a.m.