newsgroups: A shortened collection of newsgroup messages with the first 3...

Description Usage Format Source Examples

Description

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. We use in this package only its first 3 classes for demonstration purposes.

Usage

1
2
3
4
5

Format

newsgroup.train.documents and newsgroup.test.documents comprise a corpus of 2731 newsgroup documents partitioned into 1633 training and 1098 test cases evenly distributed across 3 classes.

newsgroup.train.labels is a numeric vector of length 1633 which gives a class label from 1 to 3 for each training document in the corpus.

newsgroup.test.labels is a numeric vector of length 1098 which gives a class label from 1 to 3 for each test document in the corpus.

newsgroup.vocab is the vocabulary of the corpus.

stopwords English stopwords extracted from the tm package.

Source

http://qwone.com/~jason/20Newsgroups/

Examples

1
2
3
4
5
6


Search within the medSTC package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.