rboTopics: Pairwise RBO Similarities

View source: R/rboTopics.R

rboTopicsR Documentation

Pairwise RBO Similarities

Description

Calculates the similarity of all pairwise topic combinations using the rank-biased overlap (RBO) Similarity.

Usage

rboTopics(topics, k, p, progress = TRUE, pm.backend, ncpus)

Arguments

topics

[named matrix]
The counts of vocabularies/words (row wise) in topics (column wise).

k

[integer(1)]
Maximum depth for evaluation. Words down to this rank are considered for the calculation of similarities.

p

[0,1]
Weighting parameter. Lower values emphasizes top ranked words while values that go towards 1 correspond to equal weights for each evaluation depth.

progress

[logical(1)]
Should a nice progress bar be shown? Turning it off, could lead to significantly faster calculation. Default is TRUE. If pm.backend is set, parallelization is done and no progress bar will be shown.

pm.backend

[character(1)]
One of "multicore", "socket" or "mpi". If pm.backend is set, parallelStart is called before computation is started and parallelStop is called after.

ncpus

[integer(1)]
Number of (physical) CPUs to use. If pm.backend is passed, default is determined by availableCores.

Details

The RBO Similarity for two topics \bm z_{i} and \bm z_{j} is calculated by

RBO(\bm z_{i}, \bm z_{j} \mid k, p) = 2p^k\frac{≤ft|Z_{i}^{(k)} \cap Z_{j}^{(k)}\right|}{≤ft|Z_{i}^{(k)}\right| + ≤ft|Z_{j}^{(k)}\right|} + \frac{1-p}{p} ∑_{d=1}^k 2 p^d\frac{≤ft|Z_{i}^{(d)} \cap Z_{j}^{(d)}\right|}{≤ft|Z_{i}^{(d)}\right| + ≤ft|Z_{j}^{(d)}\right|}

with Z_{i}^{(d)} is the vocabulary set of topic \bm z_{i} down to rank d. Ties in ranks are resolved by taking the minimum.

The value wordsconsidered describes the number of words per topic ranked at rank k or above.

Value

[named list] with entries

sims

[lower triangular named matrix] with all pairwise similarities of the given topics.

wordslimit

[integer] = vocabulary size. See jaccardTopics for original purpose.

wordsconsidered

[integer] = vocabulary size. See jaccardTopics for original purpose.

param

[named list] with parameter type [character(1)] = "RBO Similarity", k [integer(1)] and p [0,1]. See above for explanation.

References

Webber, William, Alistair Moffat and Justin Zobel (2010). "A similarity measure for indefinite rankings". In: ACM Transations on Information Systems 28(4), p.20:1–-20:38, DOI 10.1145/1852102.1852106, URL https://doi.acm.org/10.1145/1852102.1852106

See Also

Other TopicSimilarity functions: cosineTopics(), dendTopics(), getSimilarity(), jaccardTopics(), jsTopics()

Examples

res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
rbo = rboTopics(topics, k = 12, p = 0.9)
rbo

sim = getSimilarity(rbo)
dim(sim)


JonasRieger/ldaPrototype documentation built on Feb. 5, 2023, 6:45 p.m.