twoNewsGroups: Document term matrix for documents sampled from two...

twoNewsGroupsR Documentation

Document term matrix for documents sampled from two newsgroups

Description

A dataset containing two document term matrices for subsets of two newsgroups (rec.sport.baseball and sci.med) from the 20 newsgroups dataset.

Usage

twoNewsGroups

Format

A list of two matrices, each having dimension 594 by 16214. The (i,j) entry of each matrix is the count (term frequency) of the jth word in the ith document. The first matrix in the list contains 594 sampled documents from the rec.sport.baseball newsgroup. The second contains 594 sampled documents from the sci.med newsgroup.

Source

http://qwone.com/~jason/20Newsgroups/


AmandaRP/hddtest documentation built on March 18, 2023, 5:53 p.m.