fuzzy_tidy: Tidy a messy column of strings of characters in a table

Description Usage Arguments Details See Also Examples

View source: R/fuzzy_tidy.R

Description

This function takes a column of strings of characters and attempts to tidy it.

Usage

1
2
3
4
5
6
7
fuzzy_tidy(
  .data,
  stringvar,
  template = NULL,
  threshold = options("fuzzy_threshold")[[1]],
  ...
)

Arguments

.data

A data.frame or tbl

stringvar

The name of the column to be tidy (quoted or not)

template

A lookup table created with fuzzy_match() (optional)

threshold

the minimum distance between strings considered as OK

...

additional arguments for the function stringdist()

Details

The function adds three new column in the dataset. For example, if you target column is called fruit, it will add:

If the tidying does not satisfy you, think of adjusting the argument threshold either directly when calling the function or by setting a general option for the package using options("threshold" = X) with X the number of your choice. You can also improve the tidying by providing fine-tuning argument to the underlying workhorse stringdist() using the ... argument.

You can use a template created with fuzzy_match() to control how the messy strings are being tidy.

See Also

fuzzy_pool(), fuzzy_match(), stringdist()

Examples

1
2
3
4
5
6
7
test_df <- data.frame(fruit = c("banana", "blueberry", "limon", "pinapple",
                                "apple", "aple", "Apple", "bonana"),
                      number = 1:8)
fuzzy_template <- fuzzy_match(test_df, fruit)
fuzzy_template
fuzzy_tidy(test_df, "fruit", fuzzy_template)
fuzzy_tidy(test_df, "fruit")

courtiol/dfuzz documentation built on Oct. 28, 2020, 6 a.m.