pack | R Documentation |
Packs a prettified data.frame of tokens into a new data.frame of corpus, which is compatible with the Text Interchange Formats.
pack(tbl, pull = "token", n = 1L, sep = "-", .collapse = " ")
tbl |
A prettified data.frame of tokens. |
pull |
Column to be packed into text or ngrams body. Default value is |
n |
Integer internally passed to ngrams tokenizer function
created of |
sep |
Character scalar internally used as the concatenator of ngrams. |
.collapse |
This argument is passed to |
A data.frame.
The Text Interchange Formats (TIF) is a set of standards that allows R text analysis packages to target defined inputs and outputs for corpora, tokens, and document-term matrices.
The prettified data.frame of tokens here is a data.frame object compatible with the TIF.
A TIF valid data.frame of tokens are expected to have one unique key column (named doc_id
)
of each text and several feature columns of each tokens.
The feature columns must contain at least token
itself.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.