nlp_split_paragraphs: Split Text into Paragraphs
In textpress: A Lightweight and Versatile NLP Toolkit

nlp_split_paragraphs

R Documentation

Split Text into Paragraphs

Description

Splits text from the 'text' column of a data frame into individual paragraphs, based on a specified paragraph delimiter.

Usage

nlp_split_paragraphs(tif, paragraph_delim = "\\n+")

Arguments

`tif`	A data frame with at least two columns: 'doc_id' and 'text'.
`paragraph_delim`	A regular expression pattern used to split text into paragraphs.

Value

A data.table with columns: 'doc_id', 'paragraph_id', and 'text'. Each row represents a paragraph, along with its associated document and paragraph identifiers.

Examples

tif <- data.frame(doc_id = c('1', '2'),
                  text = c("Hello world.\n\nMind your business!",
                           "This is an example.n\nThis is a party!"))
paragraphs <- nlp_split_paragraphs(tif)

textpress documentation built on Oct. 14, 2024, 5:08 p.m.