corpus_tidiers | R Documentation |
Tidy a corpus object from the quanteda package. tidy
returns a
tbl_df with one-row-per-document, with a text
column containing
the document's text, and one column for each document-level metadata.
glance
returns a one-row tbl_df with corpus-level metadata,
such as source and created. For Corpus objects from the tm package,
see tidy.Corpus()
.
## S3 method for class 'corpus'
tidy(x, ...)
## S3 method for class 'corpus'
glance(x, ...)
x |
A Corpus object, such as a VCorpus or PCorpus |
... |
Extra arguments, not used |
For the most part, the tidy
output is equivalent to the
"documents" data frame in the corpus object, except that it is converted
to a tbl_df, and texts
column is renamed to text
to be consistent with other uses in tidytext.
Similarly, the glance
output is simply the "metadata" object,
with NULL fields removed and turned into a one-row tbl_df.
if (requireNamespace("quanteda", quietly = TRUE)) {
data("data_corpus_inaugural", package = "quanteda")
data_corpus_inaugural
tidy(data_corpus_inaugural)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.