tokenizers | R Documentation |
A simple Unicode alphabetic tokenizer.
Unicode_alphabetic_tokenizer(x)
x |
a character vector. |
Tokenization first replaces the elements of x
by their Unicode
character sequences. Then, the non-alphabetic characters (i.e., the
ones which do not have the Alphabetic property) are replaced by
blanks, and the corresponding strings are split according to the
blanks.
A character vector with the tokenized strings.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.