View source: R/nlp_tokenize_text.R
nlp_tokenize_text | R Documentation |
This function tokenizes text data from a data frame using the 'tokenizers' package, preserving the original text structure like capitalization and punctuation.
nlp_tokenize_text(
tif,
text_hierarchy = c("doc_id", "paragraph_id", "sentence_id")
)
tif |
A data frame containing the text to be tokenized and a document identifier in 'doc_id'. |
text_hierarchy |
A character string specifying grouping column. |
A named list of tokens, where each list item corresponds to a document.
tif <- data.frame(doc_id = c('1', '1', '2'),
sentence_id = c('1', '2', '1'),
text = c("Hello world.",
"This is an example.",
"This is a party!"))
tokens <- nlp_tokenize_text(tif, text_hierarchy = c('doc_id', 'sentence_id'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.