Description Usage Arguments Details Value
Use the stringdist
package to perform a fuzzy match on two datasets.
1 2 3 4 5 6 7 8 9 10 11 12 |
data1 |
data.frame. First to-merge dataset. |
data2 |
data.frame. Second to-merge dataset. |
by |
character string. Variables to merge on (common across data 1 and data 2). See |
by.x |
character string. Variable to merge on in data1. See |
by.y |
character string. Variable to merge on in data2. See |
suffixes |
character vector with length==2. Suffix to add to like named variables after the merge. See |
unique_key_1 |
character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields) |
unique_key_2 |
character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields) |
fuzzy_settings |
list of arguments to pass to to the fuzzy matching function. See |
stringdist
amatch
computes string distances between every
pair of strings in two vectors, then picks the closest string pair for each
observation in the dataset. This is used by fuzzy_match
to perform
a string distance-based match between two datasets. This process can take quite a long time,
for quicker matches try adjusting the nthread
argument in fuzzy_settings
.
The default fuzzy_settings are sensible starting points for company name matching,
but adjusting these can greatly change how the match performs.
a data.table, the resultant merged data set, including all columns from both data sets.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.