correct_categories: Match strings with a pre-defined set of strings
In nthun/metamanager: Manage Meta-analysis Workflow in R

Description Usage Arguments Value Examples

Correct strings to pre-defined strings. This is a wrapper for stringdist approimate string matching, where certain parameters are preset, and can be used easily in a tidyverse pipe. Using cosine matching to disregard word order.

1
2
3

correct_categories(to_be_corrected = NULL, correct_terms = NULL,
max_dist = 2, method =  c("cosine", "osa", "lv", "dl", "hamming",
"lcs", "qgram", "jaccard", "jw", "soundex"), ...)

`to_be_corrected`	vector containing the strings to be corrected
`correct_terms`	string vector containing the correct terms
`max_dist`	parameter passed down to stringdist::amatch() with a default
`method`	parameter passed down to stringdist::amatch() with a default
`...`	further parameters to be passed down to stringdist::amatch()

A corrected string vector that can only contain the correct terms

library(dplyr)
reasons <- c("sample characteristics",
             "publication type",
             "manipulation",
             "other")

# Create category names with typos
reasons_with_typo <- c("simple characteristisc",
                       "publication t",
                       "manuplation",
                       "o",
                       "publicaton type")

# Create a dataset with random correct and incorrect categories in the "reason" column
df_with_typos <-
                 workaholism_pubmed %>%
                 mutate(decision = sample(c(0,1), size = nrow(.), replace = TRUE),
                        reason = if_else(decision == 0,
                                         NA_character_,
                         # Mix correct and incorrect categories
                                         sample(c(reasons, reasons_with_typo),
                                                size = nrow(.),
                                                replace = TRUE)
                                 )
                 )

# The typos are corrected in a new column
mutate(df_with_typos, corrected_reason = correct_categories(reason, reasons))