Weka_tokenizers | R Documentation |
R interfaces to Weka tokenizers.
AlphabeticTokenizer(x, control = NULL) NGramTokenizer(x, control = NULL) WordTokenizer(x, control = NULL)
x |
a character vector with strings to be tokenized. |
control |
an object of class |
AlphabeticTokenizer
is an alphabetic string tokenizer, where
tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer
splits strings into n-grams with given
minimal and maximal numbers of grams.
WordTokenizer
is a simple word tokenizer.
A character vector with the tokenized strings.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.