model_distances: Calculate topic dissimilarity across models

model_distancesR Documentation

Calculate topic dissimilarity across models

Description

This function calculates dissimilarities between topic-word distributions over a list of models. The result can be used to align topics in different models of the same (or similar) corpora: see align_topics.

Usage

model_distances(ms, n_words, g = JS_divergence)

## S3 method for class 'model_distances'
x[m1, m2, i, j]

print.model_distances(x)

Arguments

ms

list of mallet_model objects

n_words

number of top words from each topic to consider

g

dissimilarity function taking two topic-word matrices and returning the matrix of dissimilarities between rows, d_{ij} = g(θ_i, θ_j). By default, the Jensen-Shannon divergence is used (JS_divergence). Or you might try the cosine distance (cosine_distance). If you have a function f of two vectors, you can lift it to matrix rows, at a speed penalty, as function (X, Y) apply(Y, 1, function (y) apply(X, 1, f, y)) (N.B. the transpose is necessary).

Details

The models in ms need not have the same number of topics.

Value

a model_distances object, which is a list including elements d, a list of lists of matrices representing the upper block-triangle of distances, and ms, n_words, g storing the arguments. If x is the result of the function, the dissimilarity between topic i from model m1 and topic j from model m2 > m1 is found at x$d[[m1]][[m2 - m1]][i, j]. For convenience, this can be expressed as x[m1, m2, i, j].

See Also

align_topics


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.