cosineTopics: Pairwise Cosine Similarities

View source: R/cosineTopics.R

cosineTopicsR Documentation

Pairwise Cosine Similarities

Description

Calculates the similarity of all pairwise topic combinations using the Cosine Similarity.

Usage

cosineTopics(topics, progress = TRUE, pm.backend, ncpus)

Arguments

topics

[named matrix]
The counts of vocabularies/words (row wise) in topics (column wise).

progress

[logical(1)]
Should a nice progress bar be shown? Turning it off, could lead to significantly faster calculation. Default is TRUE. If pm.backend is set, parallelization is done and no progress bar will be shown.

pm.backend

[character(1)]
One of "multicore", "socket" or "mpi". If pm.backend is set, parallelStart is called before computation is started and parallelStop is called after.

ncpus

[integer(1)]
Number of (physical) CPUs to use. If pm.backend is passed, default is determined by availableCores.

Details

The Cosine Similarity for two topics \bm z_{i} and \bm z_{j} is calculated by

\cos(θ | \bm z_{i}, \bm z_{j}) = \frac{ ∑_{v=1}^{V}{n_{i}^{(v)} n_{j}^{(v)}} }{ √{∑_{v=1}^{V}{≤ft(n_{i}^{(v)}\right)^2}} √{∑_{v=1}^{V}{≤ft(n_{j}^{(v)}\right)^2}} }

with θ determining the angle between the corresponding count vectors \bm z_{i} and \bm z_{j}, V is the vocabulary size and n_k^{(v)} is the count of assignments of the v-th word to the k-th topic.

Value

[named list] with entries

sims

[lower triangular named matrix] with all pairwise similarities of the given topics.

wordslimit

[integer] = vocabulary size. See jaccardTopics for original purpose.

wordsconsidered

[integer] = vocabulary size. See jaccardTopics for original purpose.

param

[named list] with parameter type [character(1)] = "Cosine Similarity".

See Also

Other TopicSimilarity functions: dendTopics(), getSimilarity(), jaccardTopics(), jsTopics(), rboTopics()

Examples

res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
cosine = cosineTopics(topics)
cosine

sim = getSimilarity(cosine)
dim(sim)


JonasRieger/ldaPrototype documentation built on Feb. 5, 2023, 6:45 p.m.