findDuplicates: findDuplicates

Description Usage Arguments Value Examples

View source: R/findDuplicates.R

Description

Finds possible duplicates in data set.

Usage

1
2
findDuplicates(data, var, dmax = 3, exclude = c("", "."),
  ignore.case = FALSE)

Arguments

data

data frame

var

character: name of variable

dmax

maximal levensthein distance for matching in text variables $l(t_i1,tj2]<dmax$), defaults to 3

exclude

entries to be excluded from the unique values, defaults to c('', '.')

ignore.case

if FALSE, the uniques values are case sensitive and if TRUE, case is ignored

Value

a list structure with possibly duplicates

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
set.seed(0)
# create two data sets where the second consists of
# 200 obs. only in t1, 200 obs. in t1 and t2 and
# 100 obs. only in t2
n <- list(c(200, 1), c(200, 1, 2), c(100, 2))
x <- generateTestData(n)
#
#
match <- findDuplicates(x[[1]], 'code')
head(match)

sigbertklinke/findMatch documentation built on July 12, 2019, 9:22 a.m.