# stringsim: Compute similarity scores between strings In stringdist: Approximate String Matching, Fuzzy Text Search, and String Distance Functions

## Description

`stringsim` computes pairwise string similarities between elements of `character` vectors `a` and `b`, where the vector with less elements is recycled. `stringsimmatrix` computes the string similarity matrix with rows according to `a` and columns according to `b`.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```stringsim( a, b, method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex"), useBytes = FALSE, q = 1, ... ) stringsimmatrix( a, b, method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex"), useBytes = FALSE, q = 1, ... ) ```

## Arguments

 `a` R object (target); will be converted by `as.character`. `b` R object (source); will be converted by `as.character`. `method` Method for distance calculation. The default is `"osa"`, see `stringdist-metrics`. `useBytes` Perform byte-wise comparison, see `stringdist-encoding`. `q` Size of the q-gram; must be nonnegative. Only applies to `method='qgram'`, `'jaccard'` or `'cosine'`. `...` additional arguments are passed on to `stringdist` and `stringdistmatrix` respectively.

## Details

The similarity is calculated by first calculating the distance using `stringdist`, dividing the distance by the maximum possible distance, and substracting the result from 1. This results in a score between 0 and 1, with 1 corresponding to complete similarity and 0 to complete dissimilarity. Note that complete similarity only means equality for distances satisfying the identity property. This is not the case e.g. for q-gram based distances (for example if q=1, anagrams are completely similar). For distances where weights can be specified, the maximum distance is currently computed by assuming that all weights are equal to 1.

## Value

`stringsim` returns a vector with similarities, which are values between 0 and 1 where 1 corresponds to perfect similarity (distance 0) and 0 to complete dissimilarity. `NA` is returned when `stringdist` returns `NA`. Distances equal to `Inf` are truncated to a similarity of 0. `stringsimmatrix` works the same way but, equivalent to `stringdistmatrix`, returns a similarity matrix instead of a vector.

## Examples

 ```1 2 3 4 5 6``` ```# Calculate the similarity using the default method of optimal string alignment stringsim("ca", "abc") # Calculate the similarity using the Jaro-Winkler method # The p argument is passed on to stringdist stringsim('MARTHA','MATHRA',method='jw', p=0.1) ```

stringdist documentation built on Sept. 9, 2021, 5:08 p.m.