knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

Coverage Status

Travis-CI Build Status

tidystringdist

Compute string distance the tidy way. Built on top of the stringdist package.

Install tidystringdist

You'll get the dev version on:

devtools::install_github("ColinFay/tidystringdist")

Stable version is available with :

install.packages("tidystringdist")

tidystringdist basic workflow

tidycomb

First, you need to create a tibble with the combinations of words you want to compare. You can do this with the tidy_comb and tidy_comb_all functions. The first takes a base word and combines it with each elements of a list or a column of a data.frame, the 2nd combines all the possible couples from a list or a column.

If you already have a data.frame with two columns containing the strings to compare, you can skip this part.

library(tidystringdist)

tidy_comb_all(LETTERS[1:3])
tidy_comb_all(iris, Species)
tidy_comb("Paris", state.name[1:3])

tidy_string_dist

Once you've got this data.frame, you can use tidy_string_dist() to compute string distance. This function takes a data.frame, the two columns containing the strings, and one or more stringdist methods.

Note that if you've used the tidy_comb function to create your data.frame, you won't need to set the column names.

library(dplyr)
data(starwars)
tidy_comb_sw <- tidy_comb_all(starwars, name)
tidy_stringdist(tidy_comb_sw)

Default call compute all the methods. You can use specific method with the method argument:

tidy_stringdist(tidy_comb_sw, method = c("osa","jw"))

Tidyverse workflow

The goal is to provide a convenient interface to work with other tools from the tidyverse.

tidy_stringdist(tidy_comb_sw, method= "osa") %>%
  filter(osa > 20) %>%
  arrange(desc(osa))
starwars %>%
  filter(species == "Droid") %>%
  tidy_comb_all(name) %>%
  tidy_stringdist() %>% 
  summarise_if(is.numeric, mean)

Contact

Questions and feedbacks welcome!



ColinFay/tidystringdist documentation built on May 22, 2019, 4:28 p.m.