Description Usage Arguments Details See Also Examples
This function takes a column of strings of characters and attempts to tidy it.
1 2 3 4 5 6 7 | fuzzy_tidy(
.data,
stringvar,
template = NULL,
threshold = options("fuzzy_threshold")[[1]],
...
)
|
.data |
A |
stringvar |
The name of the column to be tidy (quoted or not) |
template |
A lookup table created with |
threshold |
the minimum distance between strings considered as OK |
... |
additional arguments for the function
|
The function adds three new column in the dataset. For example, if you target column is called fruit, it will add:
fruit.clean
a column containing only the elements that were considered as
OK
fruit.cleaned
a column containing only the elements substituting the
one considered as messy
fruit.tidy
a column with all the elements after
cleaning, and thus the proposition as replacement for your original column
If the tidying does not satisfy you, think of adjusting the argument
threshold
either directly when calling the function or by
setting a general option for the package using
options("threshold" = X)
with X the number of your choice. You
can also improve the tidying by providing fine-tuning argument to the
underlying workhorse stringdist()
using the ...
argument.
You can use a template created with fuzzy_match
() to control how the messy strings
are being tidy.
fuzzy_pool()
, fuzzy_match()
, stringdist()
1 2 3 4 5 6 7 | test_df <- data.frame(fruit = c("banana", "blueberry", "limon", "pinapple",
"apple", "aple", "Apple", "bonana"),
number = 1:8)
fuzzy_template <- fuzzy_match(test_df, fruit)
fuzzy_template
fuzzy_tidy(test_df, "fruit", fuzzy_template)
fuzzy_tidy(test_df, "fruit")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.