newsgroups: A shortened collection of newsgroup messages with the first 3...
In medSTC: A max-margin supervised Sparse Topical Coding Model

Description Usage Format Source Examples

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. We use in this package only its first 3 classes for demonstration purposes.

data(newsgroup.train.documents)
data(newsgroup.test.documents)
data(newsgroup.train.labels)
data(newsgroup.test.labels)
data(newsgroup.vocab)

newsgroup.train.documents and newsgroup.test.documents comprise a corpus of 2731 newsgroup documents partitioned into 1633 training and 1098 test cases evenly distributed across 3 classes.

newsgroup.train.labels is a numeric vector of length 1633 which gives a class label from 1 to 3 for each training document in the corpus.

newsgroup.test.labels is a numeric vector of length 1098 which gives a class label from 1 to 3 for each test document in the corpus.

newsgroup.vocab is the vocabulary of the corpus.

stopwords English stopwords extracted from the tm package.

http://qwone.com/~jason/20Newsgroups/

data(newsgroup.train.documents)
data(newsgroup.test.documents)
data(newsgroup.train.labels)
data(newsgroup.test.labels)
data(newsgroup.vocab)
data(stopwords)