docMatrix-class: An S4 class that contains and manages word-frequency data

Description Slots What it does

Description

The docMatrix creates a special empson object that represents a document collection as a matrix of words

Slots

directory

A string that gives the filepath to the main directory

indexFile

A string that gives the filepath to the index file for the collection.

type

A string: either "docTerm" for a document-term matrix, or "wordContext" for a word-context matrix.

mat

A matrix of words.

What it does

The docMatrix object holds a vector-space model of your document collection. The two most common vector-space models (which are the only two currently supported in empson) are document-term matrices and word-context matrices. In a document-term matrix, each column is a document in the collection, and each row is a word found in that document. In a word-context matrix, each column is a keyword, and each row is a context term found within a window of that keyword.


michaelgavin/empson documentation built on May 22, 2019, 9:50 p.m.