Spellcheck: Spellchecker

View source: R/Spellcheck.R

SpellcheckR Documentation

Spellchecker

Description

This function spellchecks a word, works well with mapply. It is specific to a process in which we have a list of uncleaned (raw) and cleaned (e.g., preprocessed with steps such as the lemmatizer or symbol removal). We also need a dictionary. The Spellchecker works this way: First, if the cleaned input word is in the dictionaries, it returns the word: we have found a word that can be coded into a dictionary and thus does not need further spellchecking. If the word is not in the dictionaries, and the uncleaned response has spaces or dashes, return the uncleaned response: the spellchecker can only correct single words. If word is NA, transform to the string "na" and return. If the cleaned word is correctly spelled, but not in the dictionaries, return cleaned word. If none of the above, then proceed to spellcheck. Use the 5 top suggestions provided by hunspell for the correct spelling. If there are no suggestions, return the incorrectly spelled preprocessed response. If there are suggestions, iterate over them. If the current iteration is the same as the uncleaned response, return it, preprocessed. If the current iteration is a word provided by another participant (rawlist parameter), return that word, preprocessed. The logic of this step is that a response provided by another participant is more likely to be the spelling meant for the current response, since it was given in a similar context. If none of the suggestions meet those requirements, return the first suggestion, which tends to be hunspells' best suggestion.

Usage

Spellcheck(raw, cleaned, dict_cleaned, rawlist)

Arguments

raw

uncleaned word response to spellcheck, if multiple use loop/apply/dplyr

cleaned

cleaned word response to spellcheck, if multiple use loop/apply/dplyr

dict_cleaned

a list of dictionary words that have been preprocessed

rawlist

a list of words that were provided by participants, from which uncleaned responses are being obtained

Value

spellchecked words


gandalfnicolas/SADCAT documentation built on June 8, 2024, 6:26 a.m.