buildMatrix: Gathers word frequencies into a matrix.

Description Usage Arguments Value Examples

View source: R/buildMatrix.R

Description

Gathers word frequencies into a matrix.

Usage

1
2
buildMatrix(dl, type = "docTerm", wordLimit = 5000, modelSize = 500,
  context = 5, method = "raw")

Arguments

dl

A docList object that contains the full text of each document in your corpus.

type

A string (either "docTerm" or "wordContext") to specify the type of matrix to be created. Document- term matrices have columns that are documents. Word-context matrices have columns that are keywords.

wordLimit

A numeric value that limits the number of words (i.e., the number of rows) that will be counted. The default is 5,000 words.

modelSize

A numeric value, used for word-context matrices, that sets the number of keywords to be evaluated. The default value is 500, and the max is 5,000. Automatically, the most frequently occuring words (not counting stopwords) are used for analysis.

context

A numeric value, for word-context matrices, that sets the size of the context window. At 5 (default), the algorithm will find words with 5 positions, before or after, each of the keywords.

method

A string, either 'raw' or 'proportional'. Determines with the word frequencies will be reported as raw integers or in proportion to the whole. Default is 'raw'.

Value

dm a docMatrix object with frequency data for the corpus.

Examples

1
2
3
dm = buildMatrix(dl)
dm = buildMatrix(dl = dl, type = "docTerm", method = "proportional")
dm = buildMatrix(dl = dl, type = "wordContext", wordLimit = 5000, modelSize = 500, context = 5, method = "raw")

michaelgavin/empson documentation built on May 22, 2019, 9:50 p.m.