align_topics: Align topics across models

align_topicsR Documentation

Align topics across models

Description

Given information about the dissimilarities among topics across a set of models, this function attempts to identify groups of similar topics from each model. In particular, it greedily seeks the single-link clustering in which no two topics from the same model are found in the same cluster ("up-to-one mapping"). The idea is from (Chuang et al., 2015). The implementation is my own (slow, unverified, experimental) one. To prepare topic dissimilarities to supply to this function, use model_distances.

Usage

align_topics(dst, threshold)

print.topic_alignment(x)

Arguments

dst

result from model_distances (q.v.)

threshold

maximum dissimilarity allowed between merging clusters. By default, the threshold is set so that any two topics from different models may ultimately join a cluster. More aggressive thresholding is recommended, in order to expose isolated topics.

Value

a topic_alignment object, which is a list of:

clusters

list of vectors, one for each model, giving cluster numbers of the topics in the model

distances

list of vectors, one for each model, giving the distance at which the given topic merged into its cluster. Because single-link clustering (if I've even implemented it correctly) is subject to "chaining," this is not necessarily an indication of the quality of a cluster, but it may give some hints.

model_distances

The supplied model_distances

threshold

The threshold used

To explore the result, alignment_frame may be useful.

References

Chuang, J, et al. 2015. "TopicCheck: Interactive Alignment for Assessing Topic Model Stability." NAACL HLT. http://scholar.princeton.edu/bstewart/publications/topiccheck-interactive-alignment-assessing-topic-model-stability.

See Also

model_distances, alignment_frame

Examples


## Not run: 
# assume m1, m2, m3 are models
dists <- model_distances(list(m1, m2, m3), n_words=40)
clusters <- align_topics(dists, threshold=0.5)
# data frame readout
alignment_frame(clusters)

## End(Not run)


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.