tokenizers | R Documentation |
These functions each turn a text into tokens. The tokenize_ngrams
functions returns shingled n-grams.
tokenize_words(string, lowercase = TRUE)
tokenize_sentences(string, lowercase = TRUE)
tokenize_ngrams(string, lowercase = TRUE, n = 3)
tokenize_skip_ngrams(string, lowercase = TRUE, n = 3, k = 1)
string |
A character vector of length 1 to be tokenized. |
lowercase |
Should the tokens be made lower case? |
n |
For n-gram tokenizers, the number of words in each n-gram. |
k |
For the skip n-gram tokenizer, the maximum skip distance between
words. The function will compute all skip n-grams between |
These functions will strip all punctuation.
A character vector containing the tokens.
dylan <- "How many roads must a man walk down? The answer is blowin' in the wind."
tokenize_words(dylan)
tokenize_sentences(dylan)
tokenize_ngrams(dylan, n = 2)
tokenize_skip_ngrams(dylan, n = 3, k = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.