nips: NIPS 00-18 Dataset

Description Usage Format Note Author(s) Source See Also

Description

It's a subset of the NIPS dataset (Chechik 2007). It consists of papers published in proceedings 00 to 18 of the Neural Information Processing Systems (NIPS) conference (i.e. years from 1988 to 2005). This corpus consists of 2,741 news articles and 9,156 unique words.

Usage

1
data("nips")

Format

vocab a vector of unique words in the corpus vocabulary.

docs a list of documents in the corpus. Each item (represents a document) is a matrix (2 X U) of word frequencies, where U represents the number of unique words in a document. Each column in the matrix represents a unique word in a document and contains

docs.metadata a matrix of document (article) metadata, where each row represents a document with

cids a vector of document collection IDs

class.labels a vector of categories (classes) in the corpus

collection.labels a vector of collections in the corpus

ds.name the corpus name (string)

num.docs the number of documents in the corpus

V the vocabulary size

Note

Created on July 26, 2015

Author(s)

Clint P. George

Source

Articles are downloaded from the link

See Also

Other datasets: news, yelp


clintpgeorge/clda documentation built on May 13, 2019, 8 p.m.