fuzzy_lookup: Fuzzy lookup

View source: R/fuzzy_lookup.R

fuzzy_lookupR Documentation

Fuzzy lookup

Description

Soft searching through a lookup table, and adding a new column containing standardized terms. Designed for data cleaning/standardization. Implements soft search, and use of lookup tables, which are currently difficult with dplyr::case_when()/case_match().

Usage

fuzzy_lookup(
  lookup,
  search_term,
  replace_term,
  .df,
  search_col,
  new_col,
  .default = "other",
  ignore.case = FALSE
)

Arguments

lookup

Lookup table (tibble) with at least 2 columns.

search_term

First column in lookup containing soft strings for regex search.

replace_term

Second column in lookup containing categorical data relating to each search term.

.df

Tibble to search through. Default behaviour will add a new column containing the replace_term where appropriate.

search_col

Name of .df column to search for fuzzy matches

new_col

Name of new column in .df that will contain categorical data.

.default

Default value if no match is found. This can be a column name from .df, or a user-defined value. Default value for .default is 'other'.

ignore.case

Option to ignore case when fuzzy matching. This is passed to str_detect(search_col, regex(search_term, ignore_case = )).

Details

Maps over lookup table and runs str_detect(string, regex(query)) under the hood.

Value

A data frame / tibble.

Examples

requireNamespace("tibble")

#Create tibble from mtcars data
mtcars_tbl <- tibble::as_tibble(mtcars,rownames='model')

#Create a lookup table with the soft search term ($1) and new standardized/consistent term ($2)
lookup_tbl <- tibble::tribble(~key1, ~key2,
                      'mazda rx4', 'Mazda RX4',
                      'Merc', 'Mercedes',
                      'merc', 'Mercedes',
                      'HORNET','Hornet',
                      'hornet','Hornet')

fuzzy_lookup(lookup = lookup_tbl,
             #lookup = lookup_tbl |> dplyr::slice(-1),
             search_term=key1, replace_term=key2,
             .df=mtcars_tbl, search_col='model', new_col='model_clean',
             .default = wt, ignore.case=TRUE)

fuzzy_lookup(lookup = lookup_tbl |> dplyr::slice(-1),
             search_term=key1, replace_term=key2,
             .df=mtcars_tbl, search_col='model', new_col='model_clean'
)


bansell/tidyExt documentation built on July 12, 2024, 12:58 p.m.