knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
Compute string distance the tidy way. Built on top of the stringdist
package.
You'll get the dev version on:
devtools::install_github("ColinFay/tidystringdist")
Stable version is available with :
install.packages("tidystringdist")
First, you need to create a tibble with the combinations of words you want to compare. You can do this with the tidy_comb
and tidy_comb_all
functions. The first takes a base word and combines it with each elements of a list or a column of a data.frame, the 2nd combines all the possible couples from a list or a column.
If you already have a data.frame with two columns containing the strings to compare, you can skip this part.
library(tidystringdist) tidy_comb_all(LETTERS[1:3])
tidy_comb_all(iris, Species)
tidy_comb("Paris", state.name[1:3])
Once you've got this data.frame, you can use tidy_string_dist()
to compute string distance. This function takes a data.frame, the two columns containing the strings, and one or more stringdist methods.
Note that if you've used the tidy_comb
function to create your data.frame, you won't need to set the column names.
library(dplyr) data(starwars) tidy_comb_sw <- tidy_comb_all(starwars, name) tidy_stringdist(tidy_comb_sw)
Default call compute all the methods. You can use specific method with the method
argument:
tidy_stringdist(tidy_comb_sw, method = c("osa","jw"))
The goal is to provide a convenient interface to work with other tools from the tidyverse.
tidy_stringdist(tidy_comb_sw, method= "osa") %>% filter(osa > 20) %>% arrange(desc(osa))
starwars %>% filter(species == "Droid") %>% tidy_comb_all(name) %>% tidy_stringdist() %>% summarise_if(is.numeric, mean)
Questions and feedbacks welcome!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.