Description Usage Arguments Value Examples
View source: R/correct_categories.R
Correct strings to pre-defined strings. This is a wrapper for stringdist approimate string matching, where certain parameters are preset, and can be used easily in a tidyverse pipe. Using cosine matching to disregard word order.
1 2 3 | correct_categories(to_be_corrected = NULL, correct_terms = NULL,
max_dist = 2, method = c("cosine", "osa", "lv", "dl", "hamming",
"lcs", "qgram", "jaccard", "jw", "soundex"), ...)
|
to_be_corrected |
vector containing the strings to be corrected |
correct_terms |
string vector containing the correct terms |
max_dist |
parameter passed down to stringdist::amatch() with a default |
method |
parameter passed down to stringdist::amatch() with a default |
... |
further parameters to be passed down to stringdist::amatch() |
A corrected string vector that can only contain the correct terms
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | library(dplyr)
reasons <- c("sample characteristics",
"publication type",
"manipulation",
"other")
# Create category names with typos
reasons_with_typo <- c("simple characteristisc",
"publication t",
"manuplation",
"o",
"publicaton type")
# Create a dataset with random correct and incorrect categories in the "reason" column
df_with_typos <-
workaholism_pubmed %>%
mutate(decision = sample(c(0,1), size = nrow(.), replace = TRUE),
reason = if_else(decision == 0,
NA_character_,
# Mix correct and incorrect categories
sample(c(reasons, reasons_with_typo),
size = nrow(.),
replace = TRUE)
)
)
# The typos are corrected in a new column
mutate(df_with_typos, corrected_reason = correct_categories(reason, reasons))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.