SDkeeper: Pre-creates a data.table or a ternary search tree
In TSTr: Ternary Search Tree for Auto-Completion and Spell Checking

Description Usage Arguments Details Value See Also Examples

Pre-calculation step for symmetric delete spelling correction. Creates a data.table or a ternary search tree to store the dictionary symmetrical deletions.

1	SDkeeper(input, maxdist, useTST = FALSE)

`input`	a filepath to read from or a character vector containing the strings from which to create the symmetrical deletions.
`maxdist`	the maximum distance to use for spell checking. The literature on spelling correction claims that around 80% of spelling errors are an edit distance of 1 from the target, and 99% an edit distance of 2. SDkeeper allows to use a distance between 1 and 3.
`useTST`	specifies if a TST must be used to store the symmetrical deletions. Default is FALSE, an indexed data.table will be used instead (better performance).

Generates terms with an edit distance <= maxdist (deletes only) from each dictionary term and add them together with the original term to the dictionary. This has to be done only once during a pre-calculation step.

For a word of length n, an alphabet size of a, an edit distance of 1, there will be just n deletions, for a total of n terms at search time. This is three orders of magnitude less expensive (36 terms for n=9 and d=2) than Peter Norvig's approach, and language independent (the alphabet is not required to generate deletes). The cost of this approach is the pre-calculation time and storage space of x deletes for every original dictionary entry, which is acceptable in most cases.

An object of class 'data.table' or 'tstTree' storing the symmetrical deletions of the specified distance.

SDcheck

1
2
3

fruitTree <- SDkeeper(c("apple", "orange", "lemon"), 2)
fruitTree <- SDkeeper(c("apple", "orange", "lemon"), 1, useTST = TRUE)
SDcheck(fruitTree,"aple")