Description Usage Arguments Examples
View source: R/corpus_tidiers.R
Tidy a Corpus object from the tm package. Returns a data frame
with one-row-per-document, with a text column containing
the document's text, and one column for each local (per-document)
metadata tag. For corpus objects from the quanteda package,
see tidy.corpus.
| 1 2 | 
| x | A Corpus object, such as a VCorpus or PCorpus | 
| collapse | A string that should be used to collapse text within each corpus (if a document has multiple lines). Give NULL to not collapse strings, in which case a corpus will end up as a list column if there are multi-line documents. | 
| ... | Extra arguments, not used | 
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | library(dplyr)   # displaying tbl_dfs
if (requireNamespace("tm", quietly = TRUE)) {
  library(tm)
  #' # tm package examples
  txt <- system.file("texts", "txt", package = "tm")
  ovid <- VCorpus(DirSource(txt, encoding = "UTF-8"),
                  readerControl = list(language = "lat"))
  ovid
  tidy(ovid)
  # choose different options for collapsing text within each
  # document
  tidy(ovid, collapse = "")$text
  tidy(ovid, collapse = NULL)$text
  # another example from Reuters articles
  reut21578 <- system.file("texts", "crude", package = "tm")
  reuters <- VCorpus(DirSource(reut21578),
                     readerControl = list(reader = readReut21578XMLasPlain))
  reuters
  tidy(reuters)
}
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.