summarize_topics | R Documentation |
Summarizes topics in a model. Called by tidylda
and refit.tidylda
and used to augment
print.tidylda
.
summarize_topics(theta, beta, dtm)
theta |
numeric matrix whose rows represent P(topic|document) |
beta |
numeric matrix whose rows represent P(token|topic) |
dtm |
a document term matrix or term co-occurrence matrix of class |
Returns a tibble
with the following columns:
topic
is the integer row number of beta
.
prevalence
is the frequency of each topic throughout the corpus it
was trained on normalized so that it sums to 100.
coherence
makes a call to calc_prob_coherence
using the default 5 most-probable terms in each topic.
top_terms
displays the top 5 most-probable terms in each topic.
prevalence
should be proportional to P(topic). It is calculated by
weighting on document length. So, topics prevalent in longer documents get
more weight than topics prevalent in shorter documents. It is calculated
by
prevalence <- rowSums(dtm) * theta %>% colSums()
prevalence <- (prevalence * 100) %>% round(3)
An alternative calculation (not implemented here) might have been
prevalence <- colSums(dtm) * t(beta) %>% colSums()
prevalence <- (prevalence * 100) %>% round(3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.