deduplicate: Remove duplicate entries from a data frame

Description Usage Arguments Value

View source: R/deduplication_functions.R

Description

Given a data frame and a field to check for duplicates, flags and removes duplicate entries with three optional methods.

Usage

1
2
deduplicate(df, field, method = c("quick", "similarity", "fuzzy"),
  language = "English", cutoff_distance = 2)

Arguments

df

the data frame to deduplicate

field

the name or index of the column to check for duplicate values

method

the manner of duplicate detection; quick removes exact text duplicates, similarity removes duplicates below a similarity threshold, and fuzzy uses fuzzdist matching

language

the language to use if method is set to similarity

cutoff_distance

the threshold below which articles are marked as duplicates by the similarity method

Value

a deduplicated data frame


elizagrames/synthesisr documentation built on May 26, 2019, 10:34 a.m.