SDkeeper: Pre-creates a data.table or a ternary search tree

Description Usage Arguments Details Value See Also Examples

View source: R/SDkeeper.R

Description

Pre-calculation step for symmetric delete spelling correction. Creates a data.table or a ternary search tree to store the dictionary symmetrical deletions.

Usage

1
SDkeeper(input, maxdist, useTST = FALSE)

Arguments

input

a filepath to read from or a character vector containing the strings from which to create the symmetrical deletions.

maxdist

the maximum distance to use for spell checking. The literature on spelling correction claims that around 80% of spelling errors are an edit distance of 1 from the target, and 99% an edit distance of 2. SDkeeper allows to use a distance between 1 and 3.

useTST

specifies if a TST must be used to store the symmetrical deletions. Default is FALSE, an indexed data.table will be used instead (better performance).

Details

Generates terms with an edit distance <= maxdist (deletes only) from each dictionary term and add them together with the original term to the dictionary. This has to be done only once during a pre-calculation step.

For a word of length n, an alphabet size of a, an edit distance of 1, there will be just n deletions, for a total of n terms at search time. This is three orders of magnitude less expensive (36 terms for n=9 and d=2) than Peter Norvig's approach, and language independent (the alphabet is not required to generate deletes). The cost of this approach is the pre-calculation time and storage space of x deletes for every original dictionary entry, which is acceptable in most cases.

Value

An object of class 'data.table' or 'tstTree' storing the symmetrical deletions of the specified distance.

See Also

SDcheck

Examples

1
2
3
fruitTree <- SDkeeper(c("apple", "orange", "lemon"), 2)
fruitTree <- SDkeeper(c("apple", "orange", "lemon"), 1, useTST = TRUE)
SDcheck(fruitTree,"aple")

TSTr documentation built on May 1, 2019, 9:16 p.m.