Create simple corpora.
a named list of control parameters.
A simple corpus is fully kept in memory. Compared to a
it is optimized for the most common usage scenario: importing plain texts from
files in a directory or directly from a vector in R, preprocessing and
transforming the texts, and finally exporting them to a term-document matrix.
It adheres to the
Corpus API. However, it takes
internally various shortcuts to boost performance and minimize memory
pressure; consequently it operates only under the following contraints:
no custom readers, i.e., each document is read in and stored as plain text (as a string, i.e., a character vector of length one),
transformations applied via
tm_map must be able to
process character vectors and return character vectors (of the same
no lazy transformations in
no meta data for individual documents (i.e., no
An object inheriting from
Corpus for basic information on the corpus infrastructure
employed by package tm.
1 2 3
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.