wordcounts_remove_rare: Remove infrequent words

wordcounts_remove_rareR Documentation

Remove infrequent words

Description

Filter out the words in a wordcounts dataframe whose overall frequency is below a threshold.

Usage

wordcounts_remove_rare(counts, n)

Arguments

counts

The dataframe from read_wordcounts

n

The maximum rank to keep: all words with frequency rank below n will be discarded

Details

It's often useful to prune documents of one-off words (many of which are OCR errors) before building MALLET instances. This is a convenience function for doing so.

Value

A filtered word-counts dataframe. Because of ties, do not expect it to have exactly n distinct words.


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.