align_topics: Align topics across models
In agoldst/dfrtopics: Tools for exploring topic models of text

align_topics

R Documentation

Align topics across models

Description

Given information about the dissimilarities among topics across a set of models, this function attempts to identify groups of similar topics from each model. In particular, it greedily seeks the single-link clustering in which no two topics from the same model are found in the same cluster ("up-to-one mapping"). The idea is from (Chuang et al., 2015). The implementation is my own (slow, unverified, experimental) one. To prepare topic dissimilarities to supply to this function, use model_distances.

Usage

align_topics(dst, threshold)

print.topic_alignment(x)

Arguments

`dst`	result from `model_distances` (q.v.)
`threshold`	maximum dissimilarity allowed between merging clusters. By default, the threshold is set so that any two topics from different models may ultimately join a cluster. More aggressive thresholding is recommended, in order to expose isolated topics.

Value

a topic_alignment object, which is a list of:

clusters: list of vectors, one for each model, giving cluster numbers of the topics in the model
distances: list of vectors, one for each model, giving the distance at which the given topic merged into its cluster. Because single-link clustering (if I've even implemented it correctly) is subject to "chaining," this is not necessarily an indication of the quality of a cluster, but it may give some hints.
model_distances: The supplied model_distances
threshold: The threshold used

To explore the result, alignment_frame may be useful.

References

Chuang, J, et al. 2015. "TopicCheck: Interactive Alignment for Assessing Topic Model Stability." NAACL HLT. http://scholar.princeton.edu/bstewart/publications/topiccheck-interactive-alignment-assessing-topic-model-stability.

Examples


## Not run: 
# assume m1, m2, m3 are models
dists <- model_distances(list(m1, m2, m3), n_words=40)
clusters <- align_topics(dists, threshold=0.5)
# data frame readout
alignment_frame(clusters)

## End(Not run)

agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.