Description Usage Format Source Examples
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. We use in this package only its first 3 classes for demonstration purposes.
1 2 3 4 5 |
newsgroup.train.documents
and newsgroup.test.documents
comprise a corpus of 2731 newsgroup documents partitioned into 1633 training
and 1098 test cases evenly distributed across 3 classes.
newsgroup.train.labels
is a numeric vector of length 1633 which gives
a class label from 1 to 3 for each training document in the corpus.
newsgroup.test.labels
is a numeric vector of length 1098 which gives
a class label from 1 to 3 for each test document in the corpus.
newsgroup.vocab
is the vocabulary of the corpus.
stopwords
English stopwords extracted from the tm package.
http://qwone.com/~jason/20Newsgroups/
1 2 3 4 5 6 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.