View source: R/strj-tokenize.R
strj_tokenize | R Documentation |
Splits text into several tokens using specified tokenizer.
strj_tokenize(
text,
format = c("list", "data.frame"),
engine = c("stringi", "budoux", "tinyseg", "mecab", "sudachipy"),
rcpath = NULL,
mode = c("C", "B", "A"),
split = FALSE
)
text |
Character vector to be tokenized. |
format |
Output format. Choose |
engine |
Tokenizer name. Choose one of 'stringi', 'budoux', 'tinyseg', 'mecab', or 'sudachipy'. Note that the specified tokenizer is installed and available when you use 'mecab' or 'sudachipy'. |
rcpath |
Path to a setting file for 'MeCab' or 'sudachipy' if any. |
mode |
Splitting mode for 'sudachipy'. |
split |
Logical. If passed as |
A list or a data.frame.
strj_tokenize(
paste0(
"\u3042\u306e\u30a4\u30fc\u30cf\u30c8",
"\u30fc\u30f4\u30a9\u306e\u3059\u304d",
"\u3068\u304a\u3063\u305f\u98a8"
)
)
strj_tokenize(
paste0(
"\u3042\u306e\u30a4\u30fc\u30cf\u30c8",
"\u30fc\u30f4\u30a9\u306e\u3059\u304d",
"\u3068\u304a\u3063\u305f\u98a8"
),
format = "data.frame"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.