Description Usage Arguments Value See Also Examples
Finds the closest string match between two data.tables.
The default method computes Jaro-Winkler string distances using the stringdist
package.
In cases with multiple closest matches, only the first match is reported.
1 | fuzzy_match(a, b, acol, bcol, method = "jw", ...)
|
a |
a source data.table |
b |
a target data.table |
acol |
column name in |
bcol |
column name in |
method |
method for |
a data.table containing any blocking columns, the source column, the closest match in the target column, and the string distance for that match.
stringdistmatrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | library(data.table)
set.seed(575)
DTA <- data.table(block1 = sample(LETTERS[1:4], 20, TRUE),
block2 = sample(LETTERS[1:4], 20, TRUE),
fruit = sample(stringr::fruit[1:12], 20, TRUE))
DTB <- data.table(block1 = sample(LETTERS[1:4], 20, TRUE),
block2 = sample(LETTERS[1:4], 20, TRUE),
fruit = sample(stringr::fruit[1:12], 20, TRUE))
fuzzy_match(DTA, DTB, "fruit", "fruit")
setkey(DTA, block1, block2)
setkey(DTB, block1, block2)
DTA[ , fuzzy_match(.SD, b = DTB[.BY], "fruit", "fruit"),
by = .(block1, block2)]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.