Description Usage Arguments Details Value See Also Examples
Pre-calculation step for symmetric delete spelling correction. Creates a data.table or a ternary search tree to store the dictionary symmetrical deletions.
1 |
input |
a filepath to read from or a character vector containing the strings from which to create the symmetrical deletions. |
maxdist |
the maximum distance to use for spell checking. The literature on spelling correction claims that around 80% of spelling errors are an edit distance of 1 from the target, and 99% an edit distance of 2. SDkeeper allows to use a distance between 1 and 3. |
useTST |
specifies if a TST must be used to store the symmetrical deletions. Default is FALSE, an indexed data.table will be used instead (better performance). |
Generates terms with an edit distance <= maxdist (deletes only) from each dictionary term and add them together with the original term to the dictionary. This has to be done only once during a pre-calculation step.
For a word of length n, an alphabet size of a, an edit distance of 1, there will be just n deletions, for a total of n terms at search time. This is three orders of magnitude less expensive (36 terms for n=9 and d=2) than Peter Norvig's approach, and language independent (the alphabet is not required to generate deletes). The cost of this approach is the pre-calculation time and storage space of x deletes for every original dictionary entry, which is acceptable in most cases.
An object of class 'data.table' or 'tstTree' storing the symmetrical deletions of the specified distance.
1 2 3 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.