reuters_dt: REUTERS-21578 dataset
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description Usage Format References

The package includes different representations of an excerpt from the REUTERS-21578 dataset as sample data. The REUTERS corpus is widely used as sample data for text classification tasks (Silva, Ribeiro 2010). The data here is taken from the tm package. See files in the 'data-raw' folder of the package how the sample data has been prepared.

1	reuters_dt

A data.table with two 2 and 20 rows:

doc_id: These are unique integer values to distinguish documents.
text: The unprocessed plain text of the documents in the corpus.

Catarina Silva, Bernardete Ribeiro (2010) Inductive Inference for Large Scale Text Classification. Kernel Approaches and Techniques, Springer: Berlin, pp. 129ff.

PolMine/bignlp documentation built on Jan. 29, 2021, 1:14 a.m.