news: 16 Newsgroups Dataset
In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

Description Usage Format Note Author(s) Source See Also

It's a subset of the 20Newsgroups dataset. This corpus consists of 10,764 news articles and 9,208 unique words.

1	data("news")

vocab a vector of unique words in the corpus vocabulary.

docs a list of documents in the corpus. Each item (represents a document) is a matrix (2 X U) of word frequencies, where U represents the number of unique words in a document. Each column in the matrix represents a unique word in a document and contains

vocabulary-id. the index of the word in the vocabulary (starts with 0)
frequency. the relative frequency of the word in the document

docs.metadata a matrix of document (article) metadata, where each row represents a document with

category. the category assigned to the article
name. the name of the news article from the 20Newsgroups dataset
doclength. the number of words in the article
collection. the collection name of each article

cids a vector of document collection IDs

class.labels a vector of categories (classes) in the corpus

collection.labels a vector of collections in the corpus

ds.name the corpus name (string)

num.docs the number of documents in the corpus

V the vocabulary size

Created on July 26, 2015

Clint P. George

Articles are downloaded via scikit-learn

Other datasets: nips, yelp

clintpgeorge/clda documentation built on May 13, 2019, 8 p.m.

clintpgeorge/clda index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clintpgeorge/clda
Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

news: 16 Newsgroups Dataset
In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

Description

Usage

Format

Note

Author(s)

Source

See Also

Related to news in clintpgeorge/clda...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/clda Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

news: 16 Newsgroups Dataset In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

Description

Usage

Format

Note

Author(s)

Source

See Also

Related to news in clintpgeorge/clda...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/clda
Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

news: 16 Newsgroups Dataset
In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model