rboTopics: Pairwise RBO Similarities
In JonasRieger/ldaPrototype: Prototype of Multiple Latent Dirichlet Allocation Runs

rboTopics

R Documentation

Pairwise RBO Similarities

Description

Calculates the similarity of all pairwise topic combinations using the rank-biased overlap (RBO) Similarity.

Usage

rboTopics(topics, k, p, progress = TRUE, pm.backend, ncpus)

Arguments

`topics`	[`named matrix`] The counts of vocabularies/words (row wise) in topics (column wise).
`k`	[`integer(1)`] Maximum depth for evaluation. Words down to this rank are considered for the calculation of similarities.
`p`	[0,1] Weighting parameter. Lower values emphasizes top ranked words while values that go towards 1 correspond to equal weights for each evaluation depth.
`progress`	[`logical(1)`] Should a nice progress bar be shown? Turning it off, could lead to significantly faster calculation. Default is `TRUE`. If `pm.backend` is set, parallelization is done and no progress bar will be shown.
`pm.backend`	[`character(1)`] One of "multicore", "socket" or "mpi". If `pm.backend` is set, `parallelStart` is called before computation is started and `parallelStop` is called after.
`ncpus`	[`integer(1)`] Number of (physical) CPUs to use. If `pm.backend` is passed, default is determined by `availableCores`.

Details

The RBO Similarity for two topics \bm z_{i} and \bm z_{j} is calculated by

RBO(\bm z_{i}, \bm z_{j} \mid k, p) = 2p^k\frac{≤ft|Z_{i}^{(k)} \cap Z_{j}^{(k)}\right|}{≤ft|Z_{i}^{(k)}\right| + ≤ft|Z_{j}^{(k)}\right|} + \frac{1-p}{p} ∑_{d=1}^k 2 p^d\frac{≤ft|Z_{i}^{(d)} \cap Z_{j}^{(d)}\right|}{≤ft|Z_{i}^{(d)}\right| + ≤ft|Z_{j}^{(d)}\right|}

with Z_{i}^{(d)} is the vocabulary set of topic \bm z_{i} down to rank d. Ties in ranks are resolved by taking the minimum.

The value wordsconsidered describes the number of words per topic ranked at rank k or above.

Value

[named list] with entries

sims: [lower triangular named matrix] with all pairwise similarities of the given topics.
wordslimit: [integer] = vocabulary size. See jaccardTopics for original purpose.
wordsconsidered: [integer] = vocabulary size. See jaccardTopics for original purpose.
param: [named list] with parameter type [character(1)] = "RBO Similarity", k [integer(1)] and p [0,1]. See above for explanation.

References

Webber, William, Alistair Moffat and Justin Zobel (2010). "A similarity measure for indefinite rankings". In: ACM Transations on Information Systems 28(4), p.20:1–-20:38, DOI 10.1145/1852102.1852106, URL https://doi.acm.org/10.1145/1852102.1852106

Examples

res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
rbo = rboTopics(topics, k = 12, p = 0.9)
rbo

sim = getSimilarity(rbo)
dim(sim)

JonasRieger/ldaPrototype documentation built on Feb. 5, 2023, 6:45 p.m.