remove_infrequent_terms: Remove infrequently occurring terms from quanteda dfm.
In preText: Diagnostics to Assess the Effects of Text Preprocessing Decisions

Description Usage Arguments Value Examples

Removes terms appearing in less than a specific proportion of documents in a corpus from a dfm.

1 2	remove_infrequent_terms(dfm_object, proportion_threshold = 0.01, indices = NULL, verbose = TRUE)

`dfm_object`	A quanteda dfm object.
`proportion_threshold`	proportion of documents a term must be included in to be included in the dfm.
`indices`	Defaults to NULL. If not NULL, then it must be a numeric vector specifying the column indices of terms the user would like to remove. Useful for removing specific terms.
`verbose`	Logical indicating whether more information should be printed to the screen to let the user know about progress in preprocessing. Defaults to TRUE.

A reduced dfm.

## Not run: 
# load the package
library(preText)
# load in the data
data("UK_Manifestos")
# preprocess data
preprocessed_documents <- factorial_preprocessing(
    UK_Manifestos,
    use_ngrams = TRUE,
    infrequent_term_threshold = 0.02,
    verbose = TRUE)
updated_dfm <- remove_infrequent_terms(preprocessed_documents$dfm_list[[1]],
                                       proportion_threshold = 0.5,
                                       indices = NULL,
                                       verbose = TRUE)

## End(Not run)