Description Usage Arguments Details Value Examples
Print documents which are the most characteristic of each level of a variable, i.e. those with the lowest Chi-squared distance to the average vocabulary of documents belonging to that level.
1 | characteristic_docs(corpus, dtm, variable, ndocs = 10, nterms = 25, p = 0.1)
|
corpus |
A |
dtm |
A |
variable |
A vector of values giving the groups for which most frequent terms should be reported. |
ndocs |
The number of (most characteristic) documents to print. |
nterms |
The number of terms to highlight in documents. |
p |
The maximum p-value up to which specific terms should be hightlighted. |
Occurrences of the nterms
most specific terms for each level are highlighted.
If stemming or other transformations have been applied to original words
using combine_terms
, all original words which have been transformed
to the specified terms are highlighted.
A list with one Corpus
object for each level (invisibly).
1 2 3 4 5 6 7 8 9 | file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
dtm <- build_dtm(corpus)
characteristic_docs(corpus, dtm, meta(corpus)$Date)
# Also works when terms have been combined
dict <- dictionary(dtm)
dtm2 <- combine_terms(dtm, dict)
characteristic_docs(corpus, dtm2, meta(corpus)$Date)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.