nlp_split_sentences: Split Text into Sentences
In textpress: A Lightweight and Versatile NLP Toolkit

nlp_split_sentences

R Documentation

Split Text into Sentences

Description

This function splits text from a data frame into individual sentences based on specified columns and handles abbreviations effectively.

Usage

nlp_split_sentences(
  tif,
  text_hierarchy = c("doc_id"),
  abbreviations = textpress::abbreviations
)

Arguments

`tif`	A data frame containing text to be split into sentences.
`text_hierarchy`	A character vector specifying the columns to group by for sentence splitting, usually 'doc_id'.
`abbreviations`	A character vector of abbreviations to handle during sentence splitting, defaults to textpress::abbreviations.

Value

A data.table with columns specified in 'by', 'sentence_id', and 'text'.

Examples

tif <- data.frame(doc_id = c('1'),
                  text = c("Hello world. This is an example. No, this is a party!"))
sentences <- nlp_split_paragraphs(tif)

textpress documentation built on Oct. 14, 2024, 5:08 p.m.