View source: R/corpus_tidiers.R
tidy.Corpus | R Documentation |
Tidy a Corpus object from the tm package. Returns a data frame
with one-row-per-document, with a text
column containing
the document's text, and one column for each local (per-document)
metadata tag. For corpus objects from the quanteda package,
see tidy.corpus()
.
## S3 method for class 'Corpus' tidy(x, collapse = "\n", ...)
x |
A Corpus object, such as a VCorpus or PCorpus |
collapse |
A string that should be used to collapse text within each corpus (if a document has multiple lines). Give NULL to not collapse strings, in which case a corpus will end up as a list column if there are multi-line documents. |
... |
Extra arguments, not used |
library(dplyr) # displaying tbl_dfs if (requireNamespace("tm", quietly = TRUE)) { library(tm) #' # tm package examples txt <- system.file("texts", "txt", package = "tm") ovid <- VCorpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat")) ovid tidy(ovid) # choose different options for collapsing text within each # document tidy(ovid, collapse = "")$text tidy(ovid, collapse = NULL)$text # another example from Reuters articles reut21578 <- system.file("texts", "crude", package = "tm") reuters <- VCorpus(DirSource(reut21578), readerControl = list(reader = readReut21578XMLasPlain)) reuters tidy(reuters) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.