| newsgroups | R Documentation |
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
data(newsgroup.train.documents)
data(newsgroup.test.documents)
data(newsgroup.train.labels)
data(newsgroup.test.labels)
data(newsgroup.vocab)
data(newsgroup.label.map)
newsgroup.train.documents and newsgroup.test.documents
comprise a corpus of 20,000 newsgroup documents conforming to the LDA format,
partitioned into 11269 training and 7505 training and test cases evenly distributed
across 20 classes.
newsgroup.train.labels is a numeric vector of length 11269 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.test.labels is a numeric vector of length 7505 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.vocab is the vocabulary of the corpus.
newsgroup.label.map maps the numeric class labels to actual class names.
http://qwone.com/~jason/20Newsgroups/
lda.collapsed.gibbs.sampler for the format of the
corpus.
data(newsgroup.train.documents)
data(newsgroup.test.documents)
data(newsgroup.train.labels)
data(newsgroup.test.labels)
data(newsgroup.vocab)
data(newsgroup.label.map)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.