remove_infrequent_terms: Remove infrequently occurring terms from quanteda dfm.

Description Usage Arguments Value Examples

View source: R/remove_infrequent_terms.R

Description

Removes terms appearing in less than a specific proportion of documents in a corpus from a dfm.

Usage

1
2
3
4
5
6
remove_infrequent_terms(
  dfm_object,
  proportion_threshold = 0.01,
  indices = NULL,
  verbose = TRUE
)

Arguments

dfm_object

A quanteda dfm object.

proportion_threshold

proportion of documents a term must be included in to be included in the dfm.

indices

Defaults to NULL. If not NULL, then it must be a numeric vector specifying the column indices of terms the user would like to remove. Useful for removing specific terms.

verbose

Logical indicating whether more information should be printed to the screen to let the user know about progress in preprocessing. Defaults to TRUE.

Value

A reduced dfm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
# load the package
library(preText)
# load in the data
data("UK_Manifestos")
# preprocess data
preprocessed_documents <- factorial_preprocessing(
    UK_Manifestos,
    use_ngrams = TRUE,
    infrequent_term_threshold = 0.02,
    verbose = TRUE)
updated_dfm <- remove_infrequent_terms(preprocessed_documents$dfm_list[[1]],
                                       proportion_threshold = 0.5,
                                       indices = NULL,
                                       verbose = TRUE)

## End(Not run)

matthewjdenny/preptest documentation built on July 27, 2021, 1:19 a.m.