knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of {dfuzz} is to help you cleaning up a messy column of strings
of characters in your tibble
or data.frame
.
This package is highly experimental and is not yet ready for being used for real applications.
It is build around two dependencies which themselves have no dependencies:
{stringdist}, and it is possible to use the full power of the function stringdist()
from this excellent package.
{dfuzz} aims at being compatible with both tidyverse and base R dialects.
You can install this package using {remotes} (or {devtools}):
remotes::install_github("courtiol/dfuzz")
library(dfuzz) ## a toy example: test_df <- data.frame(fruit = c("banana", "blueberry", "limon", "pinapple", "aple", "apple", "ApplE", "bonana")) test_df ## fast and dirty workflow: clean_df1 <- fuzzy_tidy(test_df, fruit) clean_df1 ## more subtle workflow: template_fruit <- fuzzy_match(test_df, fruit) template_fruit template_fruit$selected[1] <- "apple" clean_df2 <- fuzzy_tidy(test_df, fruit, template_fruit) clean_df2 ## fast and dirty workflow with {tidyverse}: library(tidyverse) test_df %>% fuzzy_tidy(fruit) %>% mutate(fruit = fruit.tidy) %>% select(-contains("fruit.")) ## more subtle workflow with {tidyverse}: test_df %>% mutate(fruit = str_to_title(fruit)) %>% fuzzy_match(fruit) -> template_fruit template_fruit template_fruit %>% mutate(selected = fct_recode(selected, Apple = "Aple")) -> better_template_fruit better_template_fruit test_df %>% mutate(fruit = str_to_title(fruit)) %>% fuzzy_tidy(fruit, better_template_fruit) -> clean_df3 clean_df3 clean_df3 %>% mutate(fruit = fruit.tidy) %>% select(-contains("fruit."))
If you find that this package is an idea worth pursuing, please let me know. Developing is always more fun when it becomes a collaborative work. So please also email me (or leave an issue) if you want to get involved!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.