| jsTopics | R Documentation |
Calculates the similarity of all pairwise topic combinations using the Jensen-Shannon Divergence.
jsTopics(topics, epsilon = 1e-06, progress = TRUE, pm.backend, ncpus)
topics |
[ |
epsilon |
[ |
progress |
[ |
pm.backend |
[ |
ncpus |
[ |
The Jensen-Shannon Similarity for two topics \bm z_{i} and \bm z_{j} is calculated by
JS(\bm z_{i}, \bm z_{j}) = 1 - ≤ft( KLD≤ft(\bm p_i, \frac{\bm p_i + \bm p_j}{2}\right) + KLD≤ft(\bm p_j, \frac{\bm p_i + \bm p_j}{2}\right) \right)/2
= 1 - KLD(\bm p_i, \bm p_i + \bm p_j)/2 - KLD(\bm p_j, \bm p_i + \bm p_j)/2 - \log(2)
with V is the vocabulary size, \bm p_k = ≤ft(p_k^{(1)}, ..., p_k^{(V)}\right), and p_k^{(v)} is the proportion of assignments of the v-th word to the k-th topic. KLD defines the Kullback-Leibler Divergence calculated by
KLD(\bm p_{k}, \bm p_{Σ}) = ∑_{v=1}^{V} p_k^{(v)} \log{\frac{p_k^{(v)}}{p_{Σ}^{(v)}}}.
There is an epsilon added to every n_k^{(v)}, the count
(not proportion) of assignments to ensure computability with respect to zeros.
[named list] with entries
sims[lower triangular named matrix] with all pairwise
similarities of the given topics.
wordslimit[integer] = vocabulary size. See
jaccardTopics for original purpose.
wordsconsidered[integer] = vocabulary size. See
jaccardTopics for original purpose.
param[named list] with parameter specifications for
type [character(1)] = "Cosine Similarity" and
epsilon [numeric(1)]. See above for explanation.
Other TopicSimilarity functions:
cosineTopics(),
dendTopics(),
getSimilarity(),
jaccardTopics(),
rboTopics()
res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30) topics = mergeTopics(res, vocab = reuters_vocab) js = jsTopics(topics) js sim = getSimilarity(js) dim(sim) js1 = jsTopics(topics, epsilon = 1) sim1 = getSimilarity(js1) summary((sim1-sim)[lower.tri(sim)]) plot(sim, sim1, xlab = "epsilon = 1e-6", ylab = "epsilon = 1")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.