| amatch | R Documentation |
Approximate string matching equivalents of R's native
match and %in%.
amatch(
x,
table,
nomatch = NA_integer_,
matchNA = TRUE,
method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw",
"soundex"),
useBytes = FALSE,
weight = c(d = 1, i = 1, s = 1, t = 1),
maxDist = 0.1,
q = 1,
p = 0,
bt = 0,
nthread = getOption("sd_num_thread")
)
ain(x, table, ...)
x |
elements to be approximately matched: will be coerced to
|
table |
lookup table for matching. Will be coerced to |
nomatch |
The value to be returned when no match is found. This is coerced to integer. |
matchNA |
Should |
method |
Matching algorithm to use. See |
useBytes |
Perform byte-wise comparison. See |
weight |
For |
maxDist |
Elements in |
q |
q-gram size, only when method is |
p |
Winklers 'prefix' parameter for Jaro-Winkler distance, with
|
bt |
Winkler's boost threshold. Winkler's prefix factor is
only applied when the Jaro distance is larger than |
nthread |
Number of threads used by the underlying C-code. A sensible
default is chosen, see |
... |
parameters to pass to |
ain is currently defined as
ain(x,table,...) <- function(x,table,...) amatch(x, table, nomatch=0,...) > 0
amatch returns the position of the closest match of x
in table. When multiple matches with the same smallest distance
metric exist, the first one is returned. ain returns a
logical vector of length length(x) indicating wether an
element of x approximately matches an element in table.
NA handlingR's native match function matches NA with
NA. This may feel inconsistent with R's usual NA
handling, since for example NA==NA yields
NA rather than TRUE. In most cases, one may reason about the
behaviour under NA along the lines of “if one of the arguments is
NA, the result shall be NA”, simply because not all
information necessary to execute the function is available. One uses special
functions such as is.na, is.null etc. to handle special
values.
The amatch function mimics the behaviour of match
by default: NA is matched with NA and with nothing else. Note
that this is inconsistent with the behaviour of stringdist
since stringdist yields NA when at least one of the arguments
is NA. The same inconsistency exists between match
and adist. In amatch this behaviour can be
controlled by setting matchNA=FALSE. In that case, if any of the
arguments in x is NA, the nomatch value is returned,
regardless of whether NA is present in table. In
match the behaviour can be controlled by setting the
incomparables option.
Other matching:
afind()
# lets see which sci-fi heroes are stringdistantly nearest
amatch("leia",c("uhura","leela"),maxDist=5)
# we can restrict the search
amatch("leia",c("uhura","leela"),maxDist=1)
# we can match each value in the find vector against values in the lookup table:
amatch(c("leia","uhura"),c("ripley","leela","scully","trinity"),maxDist=2)
# setting nomatch returns a different value when no match is found
amatch("leia",c("uhura","leela"),maxDist=1,nomatch=0)
# this is always true if maxDist is Inf
ain("leia",c("uhura","leela"),maxDist=Inf)
# Let's look in a neighbourhood of maximum 2 typo's (by default, the OSA algorithm is used)
ain("leia",c("uhura","leela"), maxDist=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.