characteristic_docs: characteristic_docs
In R.temis: Integrated Text Mining Solution

Description Usage Arguments Details Value Examples

Print documents which are the most characteristic of each level of a variable, i.e. those with the lowest Chi-squared distance to the average vocabulary of documents belonging to that level.

1	characteristic_docs(corpus, dtm, variable, ndocs = 10, nterms = 25, p = 0.1)

`corpus`	A `Corpus` object.
`dtm`	A `DocumentTermMatrix` object corresponding to `corpus`.
`variable`	A vector of values giving the groups for which most frequent terms should be reported.
`ndocs`	The number of (most characteristic) documents to print.
`nterms`	The number of terms to highlight in documents.
`p`	The maximum p-value up to which specific terms should be hightlighted.

Occurrences of the nterms most specific terms for each level are highlighted. If stemming or other transformations have been applied to original words using combine_terms, all original words which have been transformed to the specified terms are highlighted.

A list with one Corpus object for each level (invisibly).

file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
dtm <- build_dtm(corpus)
characteristic_docs(corpus, dtm, meta(corpus)$Date)

# Also works when terms have been combined
dict <- dictionary(dtm)
dtm2 <- combine_terms(dtm, dict)
characteristic_docs(corpus, dtm2, meta(corpus)$Date)