cmp_identical | R Documentation |
Comparison functions
cmp_identical()
cmp_jarowinkler(threshold = 0.95)
jaro_winkler(threshold = 0.8)
cmp_lcs(threshold = 0.8)
lcs(threshold = 0.8)
cmp_jaccard(threshold = 0.8)
jaccard(threshold = 0.8)
threshold |
threshold to use for the Jaro-Winkler string distance when creating a binary result. |
A comparison function should accept two arguments: both vectors. When the function is called with both arguments it should compare the elements in the first vector to those in the second. When called in this way, both vectors have the same length. What the function should return depends on the methods used to score the pairs. Usually the comparison functions return a similarity score with a value of 0 indication complete difference and a value > 0 indicating similarity (often a value of 1 will indicate perfect similarity).
Some methods, such as problink_em
, can handle similarity
scores, but also need binary values (0
/FALSE
= complete
dissimilarity; 1
/TRUE
= complete similarity). In order to
allow for this the comparison function is called with one argument.
When the comparison is called with one argument, it is passed the result of
a previous comparison. The function should translate that result to a binary
(TRUE
/FALSE
or 1
/0
) result. The result should
not contain missing values.
The jaro_winkler
, lcs
and jaccard
functions use the corresponding
methods from stringdist
except that they are transformed from
a distance to a similarity score.
The functions return a comparison function (see details).
The functions identical
, jaro_winkler
, lcs
and
jaccard
are deprecated and will be removed in future versions of the
package. Instead use the functions cmp_identical
,
cmp_jarowinkler
, cmp_lcs
and cmp_jaccard
.
cmp <- cmp_identical()
x <- cmp(c("john", "mary", "susan", "jack"),
c("johan", "mary", "susanna", NA))
# Applying the comparison function to the result of the comparison results
# in a logical result, with NA's and values of FALSE set to FALSE
cmp(x)
cmp <- cmp_jarowinkler(0.95)
x <- cmp(c("john", "mary", "susan", "jack"),
c("johan", "mary", "susanna", NA))
# Applying the comparison function to the result of the comparison results
# in a logical result, with NA's and values below the threshold FALSE
cmp(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.