get_abundant: Get List of Abundant Terms

Description Usage Arguments Value Examples

Description

A function to analyze the output of the summary_corpus similar to get_spare. Returns words that appeared in more than or equal to X percent of documents, if you pass X as a decimal. Otherwise, if X is a whole number returns the words that appeared in X or more documents.

Usage

1
get_abundant(wf, ndocs, abundance)

Arguments

wf

A data table containing the word and document frequencies accross the corpus.

ndocs

A number specifying the total number of unique documents in the corpus.

abundance

A number either decimal or whole; interpreted as percent, whole as count.

Value

words A character vector of all the abundant terms.

Examples

1
2
3
4
5
## Not run: 
sparse = get_abundant(wf, 100, .95)
sparse = get_abundant(wf, 100, 95) 

## End(Not run)

avkoehl/textprocessingDSI documentation built on June 5, 2019, 7:41 p.m.