match_names: Fuzzy matcher
In svenhalvorson/SvenSFPS: Sven's R Jiggering

Description Usage Arguments Details Value Author(s) Examples

match_names takes in two data frames, merges them based on the values in fixed, and computes some measures of agreement for the columns in partials

1 2	match_names(df1, df2, fixed = NA, partials = NA, edits = TRUE, regex = TRUE)

`df1,df2`	data frames to be merged
`fixed`	character vector of columns to be merged on
`partials`	columns to fuzzy match on
`edits`	if `TRUE` edit distances are computed
`regex`	if `TRUE` partial regular expression matches returned

match_names is a function designed to help find duplicates within a data set or find matches between simliar data sets. Often you will want to determine the fixed by which values are most likely to match (like DOB). Then use the function and sort by some of the measures. A small edit distance or proportion indicate a likely match.

a data frame composed of df1 and df2 merged. Additional columns may include _count which are edit distances, _prop variables are the ratio of the edit distance to the mean number of characters, and _regex columns which indicate whether a subset of one name matches the other.

Sven Halvorson (svenedmail@gmail.com)

df1 = data.frame(x = c(1,1,2,2,3),
                  y = c("tricycle","bicycle", "triplane","double triplane", "triceratops"),
                  stringsAsFactors = FALSE)
df2 = data.frame(x = c(2,3,2,1,1),
                  y = c("tritip","biceratops", "triplane", "tripline", "tricycle" ),
                  stringsAsFactors = FALSE)
df3 = match_names(df1, df2, fixed = "x", partials = "y")