fuzzy_match | R Documentation |
Use the stringdist
package to perform a fuzzy match on two datasets.
fuzzy_match(
data1,
data2,
by = NULL,
by.x = NULL,
by.y = NULL,
suffixes,
unique_key_1,
unique_key_2,
fuzzy_settings = list(method = "jw", p = 0.1, maxDist = 0.05, matchNA = FALSE, nthread
= getOption("sd_num_thread"))
)
data1 |
data.frame. First to-merge dataset. |
data2 |
data.frame. Second to-merge dataset. |
by |
character string. Variables to merge on (common across data 1 and data 2). See |
by.x |
character string. Variable to merge on in data1. See |
by.y |
character string. Variable to merge on in data2. See |
suffixes |
character vector with length==2. Suffix to add to like named variables after the merge. See |
unique_key_1 |
character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields) |
unique_key_2 |
character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields) |
fuzzy_settings |
list of arguments to pass to to the fuzzy matching function. See |
stringdist
amatch
computes string distances between every
pair of strings in two vectors, then picks the closest string pair for each
observation in the dataset. This is used by fuzzy_match
to perform
a string distance-based match between two datasets. This process can take quite a long time,
for quicker matches try adjusting the nthread
argument in fuzzy_settings
.
The default fuzzy_settings are sensible starting points for company name matching,
but adjusting these can greatly change how the match performs.
a data.table, the resultant merged data set, including all columns from both data sets.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.