Description Usage Format Note Author(s) Source See Also
It's a subset of the NIPS dataset (Chechik 2007). It consists of papers published in proceedings 00 to 18 of the Neural Information Processing Systems (NIPS) conference (i.e. years from 1988 to 2005). This corpus consists of 2,741 news articles and 9,156 unique words.
1 | data("nips")
|
vocab
a vector of unique words in the corpus vocabulary.
docs
a list of documents in the corpus. Each item (represents a
document) is a matrix (2 X U) of word frequencies, where U represents the
number of unique words in a document. Each column in the matrix represents
a unique word in a document and contains
vocabulary-id. the index of the word in the vocabulary (starts with 0)
frequency. the relative frequency of the word in the document
docs.metadata
a matrix of document (article) metadata, where each
row represents a document with
doc.id. a unique article id
file.name. the name of the article
row.word.count. the number of words in the article
collection. the collection name of each article
cids
a vector of document collection IDs
class.labels
a vector of categories (classes) in the corpus
collection.labels
a vector of collections in the corpus
ds.name
the corpus name (string)
num.docs
the number of documents in the corpus
V
the vocabulary size
Created on July 26, 2015
Clint P. George
Articles are downloaded from the link
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.