topic_coherence: Calculate the topic coherence for each topic in a topic model

Description Usage Arguments Value References See Also Examples

View source: R/topic_coherence.R

Description

Using the the N highest probability tokens for each topic, calculate the topic coherence for each topic

Usage

1
2
topic_coherence(topic_model, dtm_data, top_n_tokens = 10,
  smoothing_beta = 1)

Arguments

topic_model

a fitted topic model object from one of the following: tm-class

dtm_data

a document-term matrix of token counts coercible to simple_triplet_matrix

top_n_tokens

an integer indicating the number of top words to consider, the default is 10

smoothing_beta

a numeric indicating the value to use to smooth the document frequencies in order avoid log zero issues, the default is 1

Value

A vector of topic coherence scores with length equal to the number of topics in the fitted model

References

Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). "Optimizing semantic coherence in topic models." In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics. Chicago

McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu. 2002.

See Also

semanticCoherence

Examples

1
2
3
4
5
# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
topic_coherence(lda, AssociatedPress[1:20,])

topicdoc documentation built on Oct. 30, 2019, 11:26 a.m.