dtm.create: Create a document term matrix from a list of tokens

Description Usage Arguments Value

Description

Create a DocumentTermMatrix from a list of document ids, terms, and frequencies.

Usage

1
2
dtm.create(documents, terms, freqs = rep(1, length(documents)), minfreq = 5,
  minlength = 3, filter.chars = TRUE, filter = rep(T, length(documents)))

Arguments

documents

a vector of document names/ids

terms

a vector of words of the same length as documents

freqs

a vector of the frequency a a term in a document

minfreq

the minimum frequency of terms for inclusion. Defaults to 5, set to 0 to skip filtering

minlength

the minimum word length (number of characters) for inclusion, set to 0 to skip filtering

filter.chars

filter out any words containing numbers or non-word characters (defaults to True)

filter

an optional boolean vector of the length of documents whether each document should be included. Any additional filtering will be applied on to op this filter

Value

a document-term matrix DocumentTermMatrix


kasperwelbers/corpus-tools documentation built on May 20, 2019, 7:37 a.m.