transform_dfm: Applies bounds, weights, and/or coarsening schemes to a dfm...

View source: R/transform_dfm.R

transform_dfmR Documentation

Applies bounds, weights, and/or coarsening schemes to a dfm or document frequency matrix to reduce the dimension of the data, reduce noise, or apply other design rules (e.g. - to exclude words that occur in too few or too many documents).

Description

Applies bounds, weights, and/or coarsening schemes to a dfm or document frequency matrix to reduce the dimension of the data, reduce noise, or apply other design rules (e.g. - to exclude words that occur in too few or too many documents).

Usage

transform_dfm(x, bounds, tfidf = FALSE, verbose = TRUE)

Arguments

x

a matrix text representation with rows corresponding to each document in a corpus and columns that represent summary measures of the text (e.g., word counts, topic proportions, etc.). Acceptable forms include a valid quanteda dfm object, a tm Document-Term Matrix, or a matrix of estimated topic proportions.

bounds

a vector of lower and upper bounds to enforce. Defaults to excluding any terms that appear in only one document and any terms that appear in every document

tfidf

optional scheme to use for weighting the DTM. Defaults to FALSE.

verbose

indicator for verbosity

Value

A bounded DFM


reaganmozer/textmatch documentation built on March 7, 2024, 2:41 p.m.