knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) library(comparator)
comparator implements comparison functions for clustering and record linkage applications. It includes functions for comparing strings, sequences and numeric vectors. Where possible, comparators are implemented in C/C++ to ensure fast performance.
Levenshtein()
: Levenshtein distance/similarityDamerauLevenshtein()
Damerau-Levenshtein distance/similarityHamming()
: Hamming distance/similarityOSA()
: Optimal String Alignment distance/similarityLCS()
: Longest Common Subsequence distance/similarityJaro()
: Jaro distance/similarityJaroWinkler()
: Jaro-Winkler distance/similarityNot yet implemented.
MongeElkan()
: Monge-Elkan similarityFuzzyTokenSet()
: Fuzzy Token Set distanceInVocabulary()
: Compares strings using a reference vocabulary. Useful for
comparing names.Lookup()
: Retrieves distances/similarities from a lookup tableBinaryComp()
: Compares strings based on whether they agree/disagree
exactly.Euclidean()
: Euclidean (L-2) distanceManhattan()
: Manhattan (L-1) distanceChebyshev()
: Chebyshev (L-∞) distanceMinkowski()
: Minkowski (L-p) distanceYou can install the latest release from CRAN by entering:
install.packages("comparator")
The development version can be installed from GitHub using devtools
:
# install.packages("devtools") devtools::install_github("ngmarchant/comparator")
A comparator is instantiated by calling its constructor function. For example, we can instantiate a Levenshtein similarity comparator that ignores differences in upper/lowercase characters as follows:
comparator <- Levenshtein(similarity = TRUE, normalize = TRUE, ignore_case = TRUE)
We can apply the comparator to character vectors element-wise as follows:
x <- c("John Doe", "Jane Doe") y <- c("jonathon doe", "jane doe") elementwise(comparator, x, y) # shorthand for above comparator(x, y)
This comparator is also defined on sequences:
x_seq <- list(c(1, 2, 1, 1), c(1, 2, 3, 4)) y_seq <- list(c(4, 3, 2, 1), c(1, 2, 3, 1)) elementwise(comparator, x_seq, y_seq) # shorthand for above comparator(x_seq, y_seq)
Pairwise comparisons are also supported using the following syntax:
# compare each string in x with each string in y and return a similarity matrix pairwise(comparator, x, y, return_matrix = TRUE) # compare the strings in x pairwise and return a similarity matrix pairwise(comparator, x, return_matrix = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.