split_documents: split_documents

Description Usage Arguments Value Examples

View source: R/corpus.R

Description

Split documents in a corpus into documents of one of more paragraphs.

Usage

1
split_documents(corpus, chunksize, preserveMetadata = TRUE)

Arguments

corpus

A Corpus object.

chunksize

The number of paragraphs each new document should contain at most.

preserveMetadata

Whether to preserve the meta-data of original documents.

Value

A Corpus object with split documents.

Examples

1
2
3
file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
split_documents(corpus, 3)

R.temis documentation built on May 13, 2021, 1:08 a.m.