align_topics | R Documentation |
Given information about the dissimilarities among topics across
a set of models, this function attempts to identify groups of
similar topics from each model. In particular, it greedily seeks
the single-link clustering in which no two topics from the same
model are found in the same cluster ("up-to-one mapping"). The idea
is from (Chuang et al., 2015). The implementation is my own (slow,
unverified, experimental) one. To prepare topic dissimilarities to
supply to this function, use model_distances
.
align_topics(dst, threshold) print.topic_alignment(x)
dst |
result from |
threshold |
maximum dissimilarity allowed between merging clusters. By default, the threshold is set so that any two topics from different models may ultimately join a cluster. More aggressive thresholding is recommended, in order to expose isolated topics. |
a topic_alignment
object, which is a list of:
clusters
list of vectors, one for each model, giving cluster numbers of the topics in the model
distances
list of vectors, one for each model, giving the distance at which the given topic merged into its cluster. Because single-link clustering (if I've even implemented it correctly) is subject to "chaining," this is not necessarily an indication of the quality of a cluster, but it may give some hints.
model_distances
The supplied model_distances
threshold
The threshold used
To explore the result, alignment_frame
may be useful.
Chuang, J, et al. 2015. "TopicCheck: Interactive Alignment for Assessing Topic Model Stability." NAACL HLT. http://scholar.princeton.edu/bstewart/publications/topiccheck-interactive-alignment-assessing-topic-model-stability.
model_distances
, alignment_frame
## Not run: # assume m1, m2, m3 are models dists <- model_distances(list(m1, m2, m3), n_words=40) clusters <- align_topics(dists, threshold=0.5) # data frame readout alignment_frame(clusters) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.