create_dtm: Create a Document-to-Term Matrix

Description Usage Arguments Value Author(s) See Also

View source: R/create_dtm.R

Description

Transform bags of words into a document to term matrix after applying some filters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
create_dtm(
  bow,
  min_term = 0,
  max_term = Inf,
  min_doc = 0,
  max_doc = Inf,
  nbterm = 1000,
  keep_terms = NULL,
  docvar = NULL
)

Arguments

bow

Tibble. Output of the function eval_bow. Document ids must be in a variable called "document".

min_term

Integer. Remove terms appearing less than this number of times.

max_term

Integer. Remove terms appearing more than this number of times.

min_doc

Integer, Remove terms appearing in less than this number of documents.

max_doc

Integer, Remove terms appearing in more than this number of documents.

nbterm

Integer. Select this number of terms based on tf-idf.

keep_terms

Character vector. List of words which should be included even if they do not meet the other criteria.

docvar

Tibble. Additional information about documents to be appended to the docvar of the dtm. Document ids must be in a variable called "document".

Value

A document to term matrix.

Author(s)

Nicolas Mangin

See Also

eval_bow


NicolasJBM/lexR documentation built on Feb. 4, 2021, 6:43 p.m.