Hamming | R Documentation |
The Hamming distance between two strings/sequences of equal length is the number of positions where the corresponding characters/sequence elements differ. It can be viewed as a type of edit distance where the only permitted operation is substitution of characters/sequence elements.
Hamming( normalize = FALSE, similarity = FALSE, ignore_case = FALSE, use_bytes = FALSE )
normalize |
a logical. If TRUE, distances/similarities are normalized to the unit interval. Defaults to FALSE. |
similarity |
a logical. If TRUE, similarity scores are returned instead of distances. Defaults to FALSE. |
ignore_case |
a logical. If TRUE, case is ignored when comparing strings. |
use_bytes |
a logical. If TRUE, strings are compared byte-by-byte rather than character-by-character. |
When the input strings/sequences x and y are of different lengths (|x| != |y|), the Hamming distance is defined to be Inf.
A Hamming similarity is returned if similarity = TRUE
. When
|x| = |y| the similarity is defined as follows:
sim(x, y) = |x| - dist(x, y),
where sim is the Hamming similarity and dist is the Hamming distance. When |x| != |y| the similarity is defined to be 0.
Normalization of the Hamming distance/similarity to the unit interval is
also supported by setting normalize = TRUE
. The raw distance/similarity
is divided by the length of the string/sequence |x| = |y|. If
|x| != |y| the normalized distance is defined to be 1,
while the normalized similarity is defined to be 0.
A Hamming
instance is returned, which is an S4 class inheriting from
StringComparator
.
While the unnormalized Hamming distance is a metric, the normalized variant is not as it does not satisfy the triangle inequality.
Other edit-based comparators include LCS
, Levenshtein
,
OSA
and DamerauLevenshtein
.
## Compare US ZIP codes x <- "90001" y <- "90209" m1 <- Hamming() # unnormalized distance m2 <- Hamming(similarity = TRUE, normalize = TRUE) # normalized similarity m1(x, y) m2(x, y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.