View source: R/semanticCoherence.R
semanticCoherence | R Documentation |
Calculate semantic coherence (Mimno et al 2011) for an STM model.
semanticCoherence(model, documents, M = 10)
model |
the STM object |
documents |
the STM formatted documents (see |
M |
the number of top words to consider per topic |
Semantic coherence is a metric related to pointwise mutual information that was introduced in a paper by David Mimno, Hanna Wallach and colleagues (see references), The paper details a series of manual evaluations which show that their metric is a reasonable surrogate for human judgment. The core idea here is that in models which are semantically coherent the words which are most probable under a topic should co-occur within the same document.
One of our observations in Roberts et al 2014 was that semantic coherence alone is relatively easy to
achieve by having only a couple of topics which all are dominated by the most common words. Thus we
suggest that users should also consider exclusivity
which provides a natural counterpoint.
This function is currently marked with the keyword internal because it does not have much error checking.
a numeric vector containing semantic coherence for each topic
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). "Optimizing semantic coherence in topic models." In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics. Chicago
Roberts, M., Stewart, B., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S., Albertson, B., et al. (2014). "Structural topic models for open ended survey responses." American Journal of Political Science, 58(4), 1064-1082.
searchK
plot.searchK
exclusivity
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
out <- prepDocuments(docs, vocab, meta)
docs<-out$documents
vocab<-out$vocab
meta <-out$meta
set.seed(02138)
#maximum EM iterations set very low so example will run quickly.
#Run your models to convergence!
mod.out <- stm(docs, vocab, 3, prevalence=~treatment + s(pid_rep), data=meta,
max.em.its=5)
semanticCoherence(mod.out, docs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.