Description Usage Arguments Value Examples
Removes terms appearing in less than a specific proportion of documents in a corpus from a dfm.
1 2 | remove_infrequent_terms(dfm_object, proportion_threshold = 0.01,
indices = NULL, verbose = TRUE)
|
dfm_object |
A quanteda dfm object. |
proportion_threshold |
proportion of documents a term must be included in to be included in the dfm. |
indices |
Defaults to NULL. If not NULL, then it must be a numeric vector specifying the column indices of terms the user would like to remove. Useful for removing specific terms. |
verbose |
Logical indicating whether more information should be printed to the screen to let the user know about progress in preprocessing. Defaults to TRUE. |
A reduced dfm.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Not run:
# load the package
library(preText)
# load in the data
data("UK_Manifestos")
# preprocess data
preprocessed_documents <- factorial_preprocessing(
UK_Manifestos,
use_ngrams = TRUE,
infrequent_term_threshold = 0.02,
verbose = TRUE)
updated_dfm <- remove_infrequent_terms(preprocessed_documents$dfm_list[[1]],
proportion_threshold = 0.5,
indices = NULL,
verbose = TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.