ldaPrototype: Prototype of Multiple Latent Dirichlet Allocation Runs

SCLOP

R Documentation

Similarity/Stability of multiple sets of Objects using Clustering with Local Pruning

Description

The function SCLOP calculates the S-CLOP value for the best possible local pruning state of a dendrogram from dendTopics. The function pruneSCLOP supplies the corresponding pruning state itself.
To get all pairwise S-CLOP scores of two LDA runs, the function SCLOP.pairwise can be used. It returns a matrix of the pairwise S-CLOP scores.
All three functions use the function disparitySum to calculate the least possible sum of disparities (on the best possible local pruning state) on a given dendrogram.

Usage

SCLOP(dend)

disparitySum(dend)

SCLOP.pairwise(sims)

Arguments

`dend`	[`dendrogram`] Output from `dendTopics`.
`sims`	[`TopicSimilarity` object or `lower triangular named matrix`] `TopicSimilarity` object or pairwise jaccard similarities of underlying topics as the `sims` element from `TopicSimilarity` objects. The topic names should be formatted as <Run X>.<Topic Y>, so that the name before the first dot identifies the LDA run.

Details

For one specific cluster g and R LDA Runs the disparity is calculated by

U(g) := \frac{1}{R} ∑_{r=1}^R \vert t_r^{(g)} - 1 \vert \cdot ∑_{r=1}^R t_r^{(g)},

while \bm t^{(g)} = (t_1^{(g)}, ..., t_R^{(g)})^T contains the number of topics that belong to the different LDA runs and that occur in cluster g.

The function disparitySum returns the least possible sum of disparities U_{Σ}(G^*) for the best possible pruning state G^* with U_{Σ}(G) = ∑_{g \in G} U(g) \to \min. The highest possible value for U_{Σ}(G^*) is limited by

U_{Σ,\textsf{max}} := ∑_{g \in \tilde{G}} U(g) = N \cdot \frac{R-1}{R},

with \tilde{G} denotes the corresponding worst case pruning state. This worst case scenario is useful for normalizing the SCLOP scores.

The function SCLOP then calculates the value

\textsf{S-CLOP}(G^*) := 1 - \frac{1}{U_{Σ,\textsf{max}}} \cdot ∑_{g \in G^*} U(g) ~\in [0,1],

where ∑\limits_{g \in G^*} U(g) = U_{Σ}(G^*).

Value

SCLOP: [0,1] value specifying the S-CLOP for the best possible local pruning state of the given dendrogram.
disparitySum: [numeric(1)] value specifying the least possible sum of disparities on the given dendrogram.
SCLOP.pairwise: [symmetrical named matrix] with all pairwise S-CLOP scores of the given LDA runs.

Examples

res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
jacc = jaccardTopics(topics, atLeast = 2)
dend = dendTopics(jacc)

SCLOP(dend)
disparitySum(dend)

SCLOP.pairwise(jacc)
SCLOP.pairwise(getSimilarity(jacc))

JonasRieger/ldaPrototype documentation built on Feb. 5, 2023, 6:45 p.m.