tidy_stringdist: Tidy stringdist calculation

Description Usage Arguments Value Examples

Description

Tidy stringdist calculation

Usage

1
2
tidy_stringdist(df, v1 = V1, v2 = V2, method = c("osa", "lv", "dl",
  "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex"), ...)

Arguments

df

a dataframe containing the strings to compare

v1

the name of the first columns

v2

the name of the second columns

method

one of the methods implemented in the stringdist package — "osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex". See stringdist-metrics

...

other parameters passed to stringdist

Value

a tibble with string distance

Examples

1
2
proust <- tidy_comb_all(c("Albertine", "Françoise", "Gilberte", "Odette", "Charles"))
tidy_stringdist(proust)

Example output

# A tibble: 10 x 12
   V1    V2      osa    lv    dl hamming   lcs qgram cosine jaccard    jw
 * <chr> <chr> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
 1 AlbeFran7     7     7       7    12    10  0.497   0.692 0.444
 2 AlbeGilb4     4     4     Inf     5     3  0.142   0.333 0.194
 3 AlbeOdet6     6     6     Inf     9     9  0.428   0.8   0.389
 4 AlbeChar8     8     8     Inf    12    10  0.544   0.75  0.579
 5 FranGilb8     8     8     Inf    13    11  0.578   0.769 0.588
 6 FranOdet8     8     8     Inf    13    13  0.789   0.917 0.574
 7 FranChar7     7     7     Inf    12     8  0.496   0.667 0.495
 8 GilbOdet5     5     5     Inf     8     8  0.4     0.778 0.375
 9 GilbChar7     7     7     Inf    11     9  0.522   0.727 0.565
10 OdetChar6     6     6     Inf    11    11  0.761   0.9   0.563
# … with 1 more variable: soundex <dbl>
Warning message:
In do_dist(a = b, b = a, method = method, weight = weight, q = q,  :
  Non-printable ascii or non-ascii characters in soundex. Results may be unreliable. See ?printable_ascii.

tidystringdist documentation built on May 2, 2019, 3:23 p.m.