View source: R/nlp_build_chunks.R
nlp_build_chunks | R Documentation |
This function processes a data frame for NLP analysis by dividing text into chunks and providing context. It generates chunks of text with a specified size and includes context based on the specified context size.
nlp_build_chunks(tif, text_hierarchy, chunk_size, context_size)
tif |
A data.table containing the text to be chunked. |
text_hierarchy |
A character vector specifying the columns used for grouping and chunking. |
chunk_size |
An integer specifying the size of each chunk. |
context_size |
An integer specifying the size of the context around each chunk. |
A data.table with the chunked text and their respective contexts.
# Creating a data frame
tif <- data.frame(doc_id = c('1', '1', '2'),
sentence_id = c('1', '2', '1'),
text = c("Hello world.",
"This is an example.",
"This is a party!"))
chunks <- nlp_build_chunks(tif,
chunk_size = 2,
context_size = 1,
text_hierarchy = c('doc_id', 'sentence_id'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.