split_text | R Documentation |
split_text
splits texts into blocks of a maximum number of bytes.
split_text(text, max_size_bytes = 29000, tokenize = "sentences")
text |
character vector to be split. |
max_size_bytes |
maximum size of a single text segment in bytes. |
tokenize |
level of tokenization. Either "sentences" or "words". |
The function uses tokenizers::tokenize_sentences
to split texts.
Returns a (tibble
) with the following columns:
text_id
position of the text in the character vector.
segment_id
ID of a text segment.
segment_text
text segment that is smaller than max_size_bytes
## Not run:
# Split long text
text <- paste0(rep("This is a very long text.", 10000), collapse = " ")
split_text(text)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.