stringdist: Approximate String Matching and String Distance Functions

Implements an approximate string matching version of R's native 'match' function. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences.

Author
Mark van der Loo [aut, cre], Jan van der Laan [ctb], R Core Team [ctb], Nick Logan [ctb]
Date of publication
2016-09-09 23:46:16
Maintainer
Mark van der Loo <mark.vanderloo@gmail.com>
License
GPL-3
Version
0.9.4.2
URLs

View on CRAN

Man pages

amatch
Approximate string matching
phonetic
Phonetic algorithms
printable_ascii
Detect the presence of non-printable or non-ascii characters
qgrams
Get a table of qgram counts from one or more character...
seq_amatch
Approximate matching for integer sequences.
seq_dist
Compute distance metrics between integer sequences
seq_qgrams
Get a table of qgram counts for integer sequences
seq_sim
Compute similarity scores between sequences of integers
stringdist
Compute distance metrics between strings
stringdist-encoding
String metrics in 'stringdist'
stringdist-metrics
String metrics in 'stringdist'
stringdist-package
A package for string distance calculation and approximate...
stringdist-parallelization
Multithreading and parallelization in 'stringdist'
stringsim
Compute similarity scores between strings

Files in this package

stringdist
stringdist/inst
stringdist/inst/CITATION
stringdist/tests
stringdist/tests/testthat.R
stringdist/tests/testthat
stringdist/tests/testthat/testSeqDist.R
stringdist/tests/testthat/testPhonetic.R
stringdist/tests/testthat/testAmatch.R
stringdist/tests/testthat/testQgrams.R
stringdist/tests/testthat/testStringsim.R
stringdist/tests/testthat/testStringdist.R
stringdist/src
stringdist/src/Makevars
stringdist/src/Rstringdist.c
stringdist/src/utils.c
stringdist/src/stringdist.h
stringdist/src/dictionary.h
stringdist/src/qtree.h
stringdist/src/lv.c
stringdist/src/soundex.c
stringdist/src/osa.c
stringdist/src/utf8ToInt.c
stringdist/src/lcs.c
stringdist/src/utils.h
stringdist/src/dist.h
stringdist/src/stringdist.c
stringdist/src/dl.c
stringdist/src/qgram.c
stringdist/src/jaro.c
stringdist/src/hamming.c
stringdist/NAMESPACE
stringdist/NEWS
stringdist/R
stringdist/R/seqdist.R
stringdist/R/utils.R
stringdist/R/stringsim.R
stringdist/R/doc_metrics.R
stringdist/R/phonetic.R
stringdist/R/qgrams.R
stringdist/R/stringdist.R
stringdist/R/doc_parallel.R
stringdist/R/doc_encoding.R
stringdist/R/amatch.R
stringdist/MD5
stringdist/DESCRIPTION
stringdist/man
stringdist/man/stringdist-package.Rd
stringdist/man/stringdist-metrics.Rd
stringdist/man/stringdist-parallelization.Rd
stringdist/man/seq_amatch.Rd
stringdist/man/amatch.Rd
stringdist/man/seq_qgrams.Rd
stringdist/man/qgrams.Rd
stringdist/man/printable_ascii.Rd
stringdist/man/stringsim.Rd
stringdist/man/phonetic.Rd
stringdist/man/stringdist.Rd
stringdist/man/seq_sim.Rd
stringdist/man/stringdist-encoding.Rd
stringdist/man/seq_dist.Rd