Description Usage Arguments Value See Also Examples
When trying to find matches in large sets, often mutliple results are possible/likely.
Especially when using multiple criteria, it can be useful to first have a broad search.
An example is trying to match a list of people to another list of people:
a first approach would be matching family-names, later extended to include first names, place of origin, etc.
Or if you're unsure what exact method to use, you can experiment with one first, then use others to further limit results, without
having to check your entire dataset again.
So this function gives the most likely matches: the maxmatch lowest distance matches, up to maxDist away.
For ties, the first matches in table are returned
1 2 3 4 5 6 | mamatch(x, table, nomatch = NA, matchNA = TRUE, method = c("osa",
"lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw",
"soundex"), useBytes = FALSE, weight = c(d = 1, i = 1, s = 1, t = 1),
maxDist = 0.1, q = 1, p = 0, bt = 0,
nthread = getOption("sd_num_thread"), maxmatch = 10, limitMem = 0,
returnAs = c("matrix", "list"), dupls = TRUE)
|
x, table, matchNA, method, useBytes, weight, maxDist, q, p, bt, nthread |
See |
nomatch |
See also amatch, but for returnAs=='list', it can be NULL |
maxmatch |
Maximum number of matches to return. |
limitMem |
Limit memory usage. For large x and table, a lot of memory is needed for the matrix with distances.
(Internally, this script calls stringdistmatrix, which means a matrix of |
returnAs |
comparable to simplify in sapply: should result be returned as a list or an matrix? |
dupls |
Are there possibly duplicates present? Decides what kind of algorithm is used. |
For returnAs=="list", a list of length(x), with elements of length between length(nomatch) and maxmatch,
with indices of closest matches in table.
For returnAs=="matrix", an matrix of length(x) columns and maxmatch rows (even if no elements have that many matches).
Non-matches are filled in with nomatch.
In both cases, for ties the first match gets priority.
1 2 3 4 5 6 7 8 9 | set.seed(1)
x <- replicate(paste(letters[ceiling(runif(n = 20)*26)], collapse=''), n = 50)
table <- replicate(paste(letters[ceiling(runif(n = 20)*26)], collapse=''), n = 200)
normal_amatch <- stringdist::amatch(x, table, method='jw', p=.1, maxDist=.5)
multi_match <- mamatch(x, table, method='jw', p=.1, maxDist = .5, maxmatch=10, returnAs='matrix')
print(identical(normal_amatch, multi_match[1,]))
# What do the closest matches for number 1 look like?
print(x[1])
print(table[multi_match[,1]])
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.