best.guess: Makes Best Guess for Spelling Correction

Description Usage Arguments Value Author(s) References Examples

Description

A wrapper function for the best guess of a spelling mistake based on the letters, the ordering of those letters, and the potential for letters to be interchanged. The Damerau-Levenshtein distance is used to guide inferences into what word the participant was trying to spell from a dictionary (see SemNetDictionaries)

Usage

1
best.guess(word, full.dictionary, dictionary = NULL, tolerance = 1)

Arguments

word

Character. A word to get best guess spelling options from dictionary

full.dictionary

Character vector. The dictionary to search for best guesses in. See SemNetDictionaries

dictionary

Character. A dictionary from SemNetDictionaries for monikers (enhances guessing)

tolerance

Numeric. The distance tolerance set for automatic spell-correction purposes. This function uses the function stringdist to compute the Damerau-Levenshtein distance, which is used to determine potential best guesses

Unique words (i.e., n = 1) that are within the (distance) tolerance are automatically output as best guess responses. This default is based on Damerau's (1964) proclamation that more than 80% of all human misspellings can be expressed by a single error (e.g., insertion, deletion, substitution, and transposition). If there is more than one word that is within or below the distance tolerance, then these will be provided as potential options.

The recommended and default distance tolerance is tolerance = 1, which only spell corrects a word if there is only one word with a DL distance of 1.

Value

The best guess(es) of the word

Author(s)

Alexander Christensen <alexpaulchristensen@gmail.com>

References

Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7, 171-176.

Examples

1
2
# Misspelled "bombay"
best.guess("bomba", full.dictionary = SemNetDictionaries::animals.dictionary)

SemNetCleaner documentation built on Sept. 16, 2021, 5:12 p.m.