stringdist-package: A package for string distance calculation and approximate...

Description Details Acknowledgements Citation


The stringdist package offers fast and platform-independent string metrics. Its main purpose is to compute various string distances and to do approximate text matching between character vectors. As of version 0.9.3, it is also possible to compute distances between sequences represented by integer vectors.


A typical use is to match strings that are not precisely the same. For example


returns c(2,NA) since "hello" matches closest with "hallo", and within the maximum (optimal string alignment) distance. The second element, "g'day", matches closest with "ola" but since the distance equals 4, no match is reported.

A second typical use is to compute string distances. For example


Returns c(5,5,4) since these are the distances between "g'day" and respectively "hi", "hallo", and "ola".

A third typical use would be to compute a dist object. The command


returns an object of class dist that can be used by clustering algorithms such as stats::hclust.

A fourth use is to compute string distances between general sequences, represented as integer vectors (which must be stored in a list):

seq_dist( list(c(1L,1L,2L)), list(c(1L,2L,1L),c(2L,3L,1L,2L)) )

The above code yields the vector c(1,2) (the first shorter first argument is recycled over the longer second argument)

Besides documentation for each function, the main topics documented are:



If you would like to cite this package, please cite the R Journal Paper:

Or use citation('stringdist') to get a bibtex item.

stringdist documentation built on Sept. 9, 2021, 5:08 p.m.