View source: R/topic_coherence.R
topic_coherence | R Documentation |
Using the the N highest probability tokens for each topic, calculate the topic coherence for each topic
topic_coherence(topic_model, dtm_data, top_n_tokens = 10, smoothing_beta = 1)
topic_model |
a fitted topic model object from one of the following:
|
dtm_data |
a document-term matrix of token counts coercible to |
top_n_tokens |
an integer indicating the number of top words to consider, the default is 10 |
smoothing_beta |
a numeric indicating the value to use to smooth the document frequencies in order avoid log zero issues, the default is 1 |
A vector of topic coherence scores with length equal to the number of topics in the fitted model
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). "Optimizing semantic coherence in topic models." In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics. Chicago
McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." https://mallet.cs.umass.edu 2002.
semanticCoherence
# Using the example from the LDA function library(topicmodels) data("AssociatedPress", package = "topicmodels") lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2) topic_coherence(lda, AssociatedPress[1:20,])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.