dtm_remove_lowfreq | R Documentation |
Remove terms occurring with low frequency from a Document-Term-Matrix and documents with no terms
dtm_remove_lowfreq(dtm, minfreq = 5, maxterms, remove_emptydocs = TRUE)
dtm |
an object returned by |
minfreq |
integer with the minimum number of times the term should occur in order to keep the term |
maxterms |
integer indicating the maximum number of terms which should be kept in the |
remove_emptydocs |
logical indicating to remove documents containing no more terms after the term removal is executed. Defaults to |
a sparse Matrix as returned by sparseMatrix
where terms with low occurrence are removed and documents without any terms are also removed
data(brussels_reviews_anno) x <- subset(brussels_reviews_anno, xpos == "NN") x <- x[, c("doc_id", "lemma")] x <- document_term_frequencies(x) dtm <- document_term_matrix(x) ## Remove terms with low frequencies and documents with no terms x <- dtm_remove_lowfreq(dtm, minfreq = 10) dim(x) x <- dtm_remove_lowfreq(dtm, minfreq = 10, maxterms = 25) dim(x) x <- dtm_remove_lowfreq(dtm, minfreq = 10, maxterms = 25, remove_emptydocs = FALSE) dim(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.