| summarize_topics | R Documentation |
Summarizes topics in a model. Called by tidylda
and refit.tidylda and used to augment
print.tidylda.
summarize_topics(theta, beta, dtm)
theta |
numeric matrix whose rows represent P(topic|document) |
beta |
numeric matrix whose rows represent P(token|topic) |
dtm |
a document term matrix or term co-occurrence matrix of class |
Returns a tibble with the following columns:
topic is the integer row number of beta.
prevalence is the frequency of each topic throughout the corpus it
was trained on normalized so that it sums to 100.
coherence makes a call to calc_prob_coherence
using the default 5 most-probable terms in each topic.
top_terms displays the top 5 most-probable terms in each topic.
prevalence should be proportional to P(topic). It is calculated by
weighting on document length. So, topics prevalent in longer documents get
more weight than topics prevalent in shorter documents. It is calculated
by
prevalence <- rowSums(dtm) * theta %>% colSums()
prevalence <- (prevalence * 100) %>% round(3)
An alternative calculation (not implemented here) might have been
prevalence <- colSums(dtm) * t(beta) %>% colSums()
prevalence <- (prevalence * 100) %>% round(3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.