bestDocs: Find Informative Documents in a Corpus

View source: R/bestDocs.R

bestDocsR Documentation

Find Informative Documents in a Corpus

Description

Find the documents in a corpus that have the most high frequency phrases and return a corpus with just those documents

Usage

bestDocs(co, num = 3L, n = 10L, pd = NULL)

Arguments

co

A corpus with documents

num

Integer with the number of documents to return

n

Integer with the number of high frequency phrases to use

pd

phraseDoc object for the corpus in co; if NULL, a phraseDoc will be created for it.

Value

A corpus with the num documents that have the most high frequency phrases, in order of the number of high frequency phrases. The corpus returned will have the meta field oldIdx set to the index of the document in the original corpus, and the meta field hfPhrases to the number of high frequency phrases it contains.

Examples

v1=c("Here is some text to test phrase mining","phrase mining is fun",
  "Some text is better than no text","No text, no phrase mining")
co=tm::VCorpus(tm::VectorSource(v1))
pd=phraseDoc(co,min.freq=2)
bestDocs(co,2,2,pd)

phm documentation built on June 8, 2022, 1:05 a.m.