Helps to find inexact matches (e.g. Nestlé vs Nestle) in text data.
devtools::install_github("richardvogg/fuzzymatch")
Short example from TidyTuesday (Week 5 - 2021)
tuesdata <- tidytuesdayR::tt_load('2021',5)
plastics <- tuesdata$plastics
dedupes <- fuzzymatch::fuzzy_dedupes(plastics$parent_company,find_cutoff=TRUE)
The output is sorted by closest stringdist. I checked that I would have the Nestlé / Nestle difference covered (which was at 0.067).
plastics$parent_company <- fuzzymatch::fuzzy_dedupes(plastics$parent_company,cutoff_distance = 0.08)
I was looking for the top 5 polluters. As Nestle is definitely one of them, I needed the data to be as clean as possible.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.