Description Usage Format Source See Also Examples
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
1 2 3 4 5 6 |
newsgroup.train.documents
and newsgroup.test.documents
comprise a corpus of 20,000 newsgroup documents conforming to the LDA format,
partitioned into 11269 training and 7505 training and test cases evenly distributed
across 20 classes.
newsgroup.train.labels
is a numeric vector of length 11269 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.test.labels
is a numeric vector of length 7505 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.vocab
is the vocabulary of the corpus.
newsgroup.label.map
maps the numeric class labels to actual class names.
http://qwone.com/~jason/20Newsgroups/
lda.collapsed.gibbs.sampler
for the format of the
corpus.
1 2 3 4 5 6 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.