felines: Felines (C-7)

Description Usage Format Note Author(s) Source See Also

Description

A corpus created from a subset of the Wikipedia articles under the categories:

All the three categories of this corpus are under the Wikipedia super-category Felines.

Usage

1

Format

vocab a vector of unique words in the corpus vocabulary.

docs a list of documents in the corpus. Each item (represents a document) is a matrix (2 X U) of word frequencies, where U represents the number of unique words in a document. Each column in the matrix represents a unique word in a document and contains

docs.metadata a matrix of document (article) metadata, where each row represents a document with

doc.N a vector of word counts of documents in the corpus

num.docs the number of documents in the corpus

class.labels a vector of unique categories (classes) in the corpus

ds.name the corpus name (string)

ds a list of two equal-length vectors

Note

Created on November 21, 2015

Author(s)

Clint P. George

Source

Articles are downloaded from the English Wikipedia with the help of Media Wiki API.

See Also

Other datasets: autos-motorcycles, bop, canis, cats, ibm-mac, med-christian-baseball, rec, sci, whales, wt16m, wt16, wt


clintpgeorge/ldamcmc documentation built on Feb. 22, 2020, 12:39 p.m.