rec: Recreation (C-2)

Description Usage Format Note Author(s) Source See Also

Description

A corpus created from the 20Newsgroups dataset. This corpus is created from a random subset of articles from the 20Newsgroups categories:

All four of these categories are classified under the super-category Recreation in the 20Newsgroups dataset.

Usage

1
data("rec")

Format

vocab a vector of unique words in the corpus vocabulary.

docs a list of documents in the corpus. Each item (represents a document) is a matrix (2 X U) of word frequencies, where U represents the number of unique words in a document. Each column in the matrix represents a unique word in a document and contains

docs.metadata a matrix of document (article) metadata, where each row represents a document with

doc.N a vector of word counts of documents in the corpus

num.docs the number of documents in the corpus

class.labels a vector of unique categories (classes) in the corpus

ds.name the corpus name (string)

ds a list of two equal-length vectors

Note

Created on November 21, 2015

Author(s)

Clint P. George

Source

Articles and categories are adapted from the 20Newsgroups dataset.

See Also

Other datasets: autos-motorcycles, bop, canis, cats, felines, ibm-mac, med-christian-baseball, sci, whales, wt16m, wt16, wt


clintpgeorge/ldamcmc documentation built on Feb. 22, 2020, 12:39 p.m.