build_dtm: build_dtm
In R.temis: Integrated Text Mining Solution

Description Usage Arguments Value Examples

Compute document-term matrix from a corpus.

build_dtm(
  corpus,
  sparsity = 1,
  dictionary = NULL,
  remove_stopwords = FALSE,
  tolower = TRUE,
  remove_punctuation = TRUE,
  remove_numbers = TRUE,
  min_length = 2
)

`corpus`	A `Corpus` object.
`sparsity`	Value between 0 and 1 indicating the proportion of documents with no occurrences of a term above which that term should be dropped. By default all terms are kept (`sparsity=1`).
`dictionary`	A vector of terms to which the matrix should be restricted. By default, all words with more than `min_length` characters are considered.
`remove_stopwords`	Whether to remove stopwords appearing in a language-specific list (see `tm::stopwords`).
`tolower`	Whether to convert all text to lower case.
`remove_punctuation`	Whether to remove all punctuation from text before tokenizing terms.
`remove_numbers`	Whether to remove all numbers from text before tokenizing terms.
`min_length`	The minimal number of characters for a word to be retained.

A DocumentTermMatrix object.

1 2 3	file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva") corpus <- import_corpus(file, "factiva", language="en") build_dtm(corpus)