get_sparse: Get List of Sparse Terms

Description Usage Arguments Value Examples

Description

A function to analyze the output of the summary_corpus. Returns words that appeared in less than or equal to X percent of documents, if you pass X as a decimal. Otherwise, if X is a whole number returns the words that appeared in X or less documents.

Usage

1
get_sparse(wf, ndocs, sparsity)

Arguments

wf

A data table containing the word and document frequencies accross the corpus.

ndocs

A number specifying the total number of unique documents in the corpus.

sparsity

A number either decimal or whole; interpreted as percent, whole as count.

Value

words A character vector of all the sparse terms.

Examples

1
2
3
4
5
## Not run: 
sparse = get_sparse(wf, 100, .03)
sparse = get_sparse(wf, 100, 3)

## End(Not run)

avkoehl/textprocessingDSI documentation built on June 5, 2019, 7:41 p.m.