get_bottom_terms: Get List of Least Frequent Terms

Description Usage Arguments Value Examples

Description

Similar to get_sparse but looks at word frequency not doc count. If X is whole number, returns the X least frequent terms. If X is decimal returns the X

Usage

1
get_bottom_terms(wf, nterms, count)

Arguments

wf

A data table containing the word and document frequencies accross the corpus.

nterms

A number specifying the total number of unique words in the corpus.

count

A number either decimal or whole; interpreted as percent, whole as count.

Value

words A character vector of the least frequent terms

Examples

1
2
3
4
5
## Not run: 
infreq = get_bottom_terms(wf, 100000, 5000) #returns 5000 least common terms
infreq = get_bottom_terms(wf, 100000, .05) #returns the bottom 5% of terms

## End(Not run)

avkoehl/textprocessingDSI documentation built on June 5, 2019, 7:41 p.m.