View source: R/corpus_reshape.R
corpus_reshape | R Documentation |
For a corpus, reshape (or recast) the documents to a different level of aggregation. Units of aggregation can be defined as documents, paragraphs, or sentences. Because the corpus object records its current "units" status, it is possible to move from recast units back to original units, for example from documents, to sentences, and then back to documents (possibly after modifying the sentences).
corpus_reshape(
x,
to = c("sentences", "paragraphs", "documents"),
use_docvars = TRUE,
...
)
x |
corpus whose document units will be reshaped |
to |
new document units in which the corpus will be recast |
use_docvars |
if |
... |
additional arguments passed to |
A corpus object with the documents defined as the new units, including document-level meta-data identifying the original documents.
# simple example
corp1 <- corpus(c(textone = "This is a sentence. Another sentence. Yet another.",
textwo = "Premiere phrase. Deuxieme phrase."),
docvars = data.frame(country=c("UK", "USA"), year=c(1990, 2000)))
summary(corp1)
summary(corpus_reshape(corp1, to = "sentences"))
# example with inaugural corpus speeches
(corp2 <- corpus_subset(data_corpus_inaugural, Year>2004))
corp2para <- corpus_reshape(corp2, to = "paragraphs")
corp2para
summary(corp2para, 50, showmeta = TRUE)
## Note that Bush 2005 is recorded as a single paragraph because that text
## used a single \n to mark the end of a paragraph.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.