Description Usage Arguments Details Value Author(s) Examples
match_names
takes in two data frames, merges them based on the values in fixed
, and computes some measures of agreement for the columns in partials
1 2 |
df1,df2 |
data frames to be merged |
fixed |
character vector of columns to be merged on |
partials |
columns to fuzzy match on |
edits |
if |
regex |
if |
match_names
is a function designed to help find duplicates within a data set or find matches between simliar data sets. Often you will want to determine the fixed
by which values are most likely to match (like DOB). Then use the function and sort by some of the measures. A small edit distance or proportion indicate a likely match.
a data frame composed of df1
and df2
merged. Additional columns may include _count
which are edit distances, _prop
variables are the ratio of the edit distance to the mean number of characters, and _regex
columns which indicate whether a subset of one name matches the other.
Sven Halvorson (svenedmail@gmail.com)
1 2 3 4 5 6 7 | df1 = data.frame(x = c(1,1,2,2,3),
y = c("tricycle","bicycle", "triplane","double triplane", "triceratops"),
stringsAsFactors = FALSE)
df2 = data.frame(x = c(2,3,2,1,1),
y = c("tritip","biceratops", "triplane", "tripline", "tricycle" ),
stringsAsFactors = FALSE)
df3 = match_names(df1, df2, fixed = "x", partials = "y")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.