README.md
In comparator: Comparison Functions for Clustering and Record Linkage

comparator: Comparison Functions for Clustering and Record Linkage

comparator implements comparison functions for clustering and record linkage applications. It includes functions for comparing strings, sequences and numeric vectors. Where possible, comparators are implemented in C/C++ to ensure fast performance.

Edit-based:

Levenshtein(): Levenshtein distance/similarity
DamerauLevenshtein() Damerau-Levenshtein distance/similarity
Hamming(): Hamming distance/similarity
OSA(): Optimal String Alignment distance/similarity
LCS(): Longest Common Subsequence distance/similarity
Jaro(): Jaro distance/similarity
JaroWinkler(): Jaro-Winkler distance/similarity

Token-based:

Not yet implemented.

Hybrid token-character:

MongeElkan(): Monge-Elkan similarity
FuzzyTokenSet(): Fuzzy Token Set distance

Other:

InVocabulary(): Compares strings using a reference vocabulary. Useful for comparing names.
Lookup(): Retrieves distances/similarities from a lookup table
BinaryComp(): Compares strings based on whether they agree/disagree exactly.

Euclidean(): Euclidean (L-2) distance
Manhattan(): Manhattan (L-1) distance
Chebyshev(): Chebyshev (L-∞) distance
Minkowski(): Minkowski (L-p) distance

You can install the latest release from CRAN by entering:

install.packages("comparator")

The development version can be installed from GitHub using devtools:

# install.packages("devtools")
devtools::install_github("ngmarchant/comparator")

A comparator is instantiated by calling its constructor function. For example, we can instantiate a Levenshtein similarity comparator that ignores differences in upper/lowercase characters as follows:

comparator <- Levenshtein(similarity = TRUE, normalize = TRUE, ignore_case = TRUE)

We can apply the comparator to character vectors element-wise as follows:

x <- c("John Doe", "Jane Doe")
y <- c("jonathon doe", "jane doe")
elementwise(comparator, x, y)
#> [1] 0.6666667 1.0000000

# shorthand for above
comparator(x, y)
#> [1] 0.6666667 1.0000000

This comparator is also defined on sequences:

x_seq <- list(c(1, 2, 1, 1), c(1, 2, 3, 4))
y_seq <- list(c(4, 3, 2, 1), c(1, 2, 3, 1))
elementwise(comparator, x_seq, y_seq)
#> [1] 0.4545455 0.7777778

# shorthand for above
comparator(x_seq, y_seq)
#> [1] 0.4545455 0.7777778

Pairwise comparisons are also supported using the following syntax:

# compare each string in x with each string in y and return a similarity matrix
pairwise(comparator, x, y, return_matrix = TRUE)
#>           [,1]      [,2]
#> [1,] 0.6666667 0.6842105
#> [2,] 0.5384615 1.0000000

# compare the strings in x pairwise and return a similarity matrix
pairwise(comparator, x, return_matrix = TRUE)
#>           [,1]      [,2]
#> [1,] 1.0000000 0.6842105
#> [2,] 0.6842105 1.0000000

Any scripts or data that you put into this service are public.

comparator documentation built on March 18, 2022, 6:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

comparator
Comparison Functions for Clustering and Record Linkage

README.md
In comparator: Comparison Functions for Clustering and Record Linkage

comparator: Comparison Functions for Clustering and Record Linkage

Supported comparators

String comparators:

Edit-based:

Token-based:

Hybrid token-character:

Other:

Numeric comparators:

Installation

Example

Try the comparator package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

comparator Comparison Functions for Clustering and Record Linkage

README.md In comparator: Comparison Functions for Clustering and Record Linkage

comparator: Comparison Functions for Clustering and Record Linkage

Supported comparators

String comparators:

Edit-based:

Token-based:

Hybrid token-character:

Other:

Numeric comparators:

Installation

Example

Try the comparator package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

comparator
Comparison Functions for Clustering and Record Linkage

README.md
In comparator: Comparison Functions for Clustering and Record Linkage