Description Usage Arguments Details Value Examples
List terms most associated (positively or negatively) with each document or each of a variable's levels.
1 2 3 4 5 6 7 8 | specific_terms(
dtm,
variable = NULL,
p = 0.1,
n = 25,
sparsity = 1,
min_occ = 2
)
|
dtm |
A |
variable |
An optional vector of values giving the groups for which most frequent terms should be reported. |
p |
The maximum p-value up to which terms should be reported. |
n |
The maximal number of terms to report (for each group, if applicable). |
sparsity |
Value between 0 and 1 indicating the proportion of documents
with no occurrences of a term above which that term should be dropped. By default
all terms are kept ( |
min_occ |
The minimum number of occurrences in the whole |
Specific terms reported here are those whose observed frequency in the document or level has the lowest probability under an hypergeometric distribution, based on their global frequencies in the corpus and on the number of occurrences of all terms in the document or variable level considered. The positive or negative character of the association is visible from the sign of the t value, or by comparing the value of the "\ column.
A list of matrices, one for each level of the variable, with columns:
"\
"\ (rather than in other levels).
"Global \
"Level": the number of occurrences of the term in the level ("internal").
"Global": the number of occurrences of the term in the corpus.
"t value": the quantile of a normal distribution corresponding the probability "Prob.".
"Prob.": the probability of observing such an extreme (high or low) number of occurrences of the term in the level, under an hypergeometric distribution.
1 2 3 4 5 | file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
dtm <- build_dtm(corpus)
specific_terms(dtm)
specific_terms(dtm, meta(corpus)$Date)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.