Description Usage Arguments Value
Tokenize (or split) text and emit multi-grams.
1 2 3 |
n |
length, in words, of each n-gram |
ignoreCase |
logical: if FALSE, the n-gram matching is case sensitive and if TRUE, case is ignored during matching. |
delimiter |
character or string that divides one word from the next.
You can use a regular expression as the |
punctuation |
a regular expression that specifies the punctuation characters parser will remove before it evaluates the input text. |
overlapping |
logical: true value allows for overlapping n-grams. |
reset |
a regular expression listing one or more punctuation characters or
strings, any of which the |
sep |
a character string to separate multiple text columns. |
minLength |
minimum length of words in ngram. Ngrams that contains words below shorter than the limit are omitted. Current implementation is not complete: it filters out ngrams where each word is below the minimum length, i.e. total length of ngram is below n*minLength + (n-1). |
pluggable n-gram parser
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.