fuzzyMerge: Fuzzy Matching for Merging Data Frames

Description Usage Arguments Value See Also

Description

Merges two data frames using one shared column. Left merges only! Direct matches are checked first, followed by multiple sets of fuzzy matches. A random match is chosen if multiple values match.

Usage

1
2
3
fuzzyMerge(dfX, dfY, by = intersect(names(dfX), names(dfY))[1], byX = by,
  byY = by, costs = list(ins = 2, del = 1, sub = 3), distance = c(0, 1, 2,
  3, 5, 7, 10, 15, 20), keepOriginal = FALSE, ...)

Arguments

dfX

first data frame to match. The returned data frame will have the same number of rows as this data frame.

dfY

second data frame to match. Note: there should be no duplicates in the matching column in this data frame!

by

column name (or number) in data frames to use for matching. Can only be one column! By default, it is the first matching column name in dfX and dfY

byX

column name in dfX if column names are different

byY

column name in dfY if column names are different

costs

The costs associated with string changes. See agrep for details

distance

vector of maximum distances for fuzzy matching. See agrep for details. Length corresponds to the number of matching iterations.

keepOriginal

if True, adds the column "Original" in the final data frame which contains vector.

...

parameters sent to agrep for fuzzy matching

Value

a data frame with the same length as vector and the same columns as df. The matched column will have the same name as col.

See Also

agrep


mnblonsky/REMI documentation built on May 23, 2019, 5:06 a.m.