fuzzy_lookup | R Documentation |
Soft searching through a lookup table, and adding a new column containing standardized terms. Designed for data cleaning/standardization. Implements soft search, and use of lookup tables, which are currently difficult with dplyr::case_when()/case_match().
fuzzy_lookup(
lookup,
search_term,
replace_term,
.df,
search_col,
new_col,
.default = "other",
ignore.case = FALSE
)
lookup |
Lookup table (tibble) with at least 2 columns. |
search_term |
First column in lookup containing soft strings for regex search. |
replace_term |
Second column in lookup containing categorical data relating to each search term. |
.df |
Tibble to search through. Default behaviour will add a new column containing the replace_term where appropriate. |
search_col |
Name of .df column to search for fuzzy matches |
new_col |
Name of new column in .df that will contain categorical data. |
.default |
Default value if no match is found. This can be a column name from .df, or a user-defined value. Default value for .default is 'other'. |
ignore.case |
Option to ignore case when fuzzy matching. This is passed to str_detect(search_col, regex(search_term, ignore_case = )). |
Maps over lookup table and runs str_detect(string, regex(query)) under the hood.
A data frame / tibble.
requireNamespace("tibble")
#Create tibble from mtcars data
mtcars_tbl <- tibble::as_tibble(mtcars,rownames='model')
#Create a lookup table with the soft search term ($1) and new standardized/consistent term ($2)
lookup_tbl <- tibble::tribble(~key1, ~key2,
'mazda rx4', 'Mazda RX4',
'Merc', 'Mercedes',
'merc', 'Mercedes',
'HORNET','Hornet',
'hornet','Hornet')
fuzzy_lookup(lookup = lookup_tbl,
#lookup = lookup_tbl |> dplyr::slice(-1),
search_term=key1, replace_term=key2,
.df=mtcars_tbl, search_col='model', new_col='model_clean',
.default = wt, ignore.case=TRUE)
fuzzy_lookup(lookup = lookup_tbl |> dplyr::slice(-1),
search_term=key1, replace_term=key2,
.df=mtcars_tbl, search_col='model', new_col='model_clean'
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.