stringsim: Compute similarity scores between strings
In stringdist: Approximate String Matching, Fuzzy Text Search, and String Distance Functions

stringsim

R Documentation

Compute similarity scores between strings

Description

stringsim computes pairwise string similarities between elements of character vectors a and b, where the vector with less elements is recycled. stringsimmatrix computes the string similarity matrix with rows according to a and columns according to b.

Usage

stringsim(
  a,
  b,
  method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw",
    "soundex"),
  useBytes = FALSE,
  q = 1,
  ...
)

stringsimmatrix(
  a,
  b,
  method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw",
    "soundex"),
  useBytes = FALSE,
  q = 1,
  ...
)

Arguments

`a`	R object (target); will be converted by `as.character`.
`b`	R object (source); will be converted by `as.character`.
`method`	Method for distance calculation. The default is `"osa"`, see `stringdist-metrics`.
`useBytes`	Perform byte-wise comparison, see `stringdist-encoding`.
`q`	Size of the `q`-gram; must be nonnegative. Only applies to `method='qgram'`, `'jaccard'` or `'cosine'`.
`...`	additional arguments are passed on to `stringdist` and `stringdistmatrix` respectively.

Details

The similarity is calculated by first calculating the distance using stringdist, dividing the distance by the maximum possible distance, and substracting the result from 1. This results in a score between 0 and 1, with 1 corresponding to complete similarity and 0 to complete dissimilarity. Note that complete similarity only means equality for distances satisfying the identity property. This is not the case e.g. for q-gram based distances (for example if q=1, anagrams are completely similar). For distances where weights can be specified, the maximum distance is currently computed by assuming that all weights are equal to 1.

Value

stringsim returns a vector with similarities, which are values between 0 and 1 where 1 corresponds to perfect similarity (distance 0) and 0 to complete dissimilarity. NA is returned when stringdist returns NA. Distances equal to Inf are truncated to a similarity of 0. stringsimmatrix works the same way but, equivalent to stringdistmatrix, returns a similarity matrix instead of a vector.

Examples



# Calculate the similarity using the default method of optimal string alignment
stringsim("ca", "abc")

# Calculate the similarity using the Jaro-Winkler method
# The p argument is passed on to stringdist
stringsim('MARTHA','MATHRA',method='jw', p=0.1)

stringdist documentation built on April 12, 2025, 2:01 a.m.

stringdist index

Package overview README.md RJournal 6 111-122 (2014) stringdist C/C++ API

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stringdist
Approximate String Matching, Fuzzy Text Search, and String Distance Functions

stringsim: Compute similarity scores between strings
In stringdist: Approximate String Matching, Fuzzy Text Search, and String Distance Functions

Compute similarity scores between strings

Description

Usage

Arguments

Details

Value

Examples

Related to stringsim in stringdist...

R Package Documentation

Browse R Packages

We want your feedback!

stringdist Approximate String Matching, Fuzzy Text Search, and String Distance Functions

stringsim: Compute similarity scores between strings In stringdist: Approximate String Matching, Fuzzy Text Search, and String Distance Functions

Compute similarity scores between strings

Description

Usage

Arguments

Details

Value

Examples

Related to stringsim in stringdist...

R Package Documentation

Browse R Packages

We want your feedback!

stringdist
Approximate String Matching, Fuzzy Text Search, and String Distance Functions

stringsim: Compute similarity scores between strings
In stringdist: Approximate String Matching, Fuzzy Text Search, and String Distance Functions