Tidiers for a corpus object from the quanteda package
Tidy a corpus object from the quanteda package.
tidy returns a
tbl_df with one-row-per-document, with a
text column containing
the document's text, and one column for each document-level metadata.
glance returns a one-row tbl_df with corpus-level metadata,
such as source and created. For Corpus objects from the tm package,
1 2 3 4 5
A Corpus object, such as a VCorpus or PCorpus
Extra arguments, not used
For the most part, the
tidy output is equivalent to the
"documents" data frame in the corpus object, except that it is converted
to a tbl_df, and
texts column is renamed to
to be consistent with other uses in tidytext.
glance output is simply the "metadata" object,
with NULL fields removed and turned into a one-row tbl_df.
1 2 3 4 5 6 7
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.