Description Usage Arguments Value Examples
Split each element of a character vector by 'split_re' into its constituent 'ngram' tokens.
1 | tokenize_text(strings, ngram, split_re = " ", ...)
|
strings |
character vector of text documents to be tokenized. |
ngram |
positive integer specifying size of ngram chunks. |
split_re |
regular expression denoting the token boundary to split strings by. |
... |
named arguments passed to 'strs;lit()' |
if 'length(strings)==1', returns a character vector of 'ngram' tokens. If 'length(strings) > 1', returns a list each of whose elements is a character vector of 'ngram' tokens.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | {
string <- "hai mi name timi + me girl nam dootza--tza"
tokenize_text(string, 1)
tokenize_text(string, 2)
lapply(1:3, function(x) tokenize_text(string, x))
tokenize_text(string, 2, "[ -]")
tokenize_text("me.lava.me.dootzi", 3, "\\.")
tokenize_text("me.lava.me.dootzi", 3, ".", fixed=TRUE)
tokenize_text(rep("me.lava.me.dootzi", 2), 3, ".", fixed=TRUE)
tokenize_text(c(string, "waow me fillin heppi meby beby"), 3)
tokenize_text(c(string, "waow me fillin heppi meby beby", NA), 3)
tokenize_text(c(string, "waow me fillin heppi meby beby", ""), 3)
tokenize_text(NA, 3)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.