mi_check | R Documentation |
This function provides a way to check the fit of the topic model by comparing the obtained mutual information for topics to values derived from simulations from the posterior. Large deviations from simulated values may indicate a poorer fit.
mi_check(m, k, groups = NULL, n_reps = 20)
m |
|
k |
topic number (calculations are only done for one topic at a time) |
groups |
optional grouping factor for documents. If supplied, the IMI values will be for words over groups rather than over individual documents |
n_reps |
number of simulations |
For a given topic k, a simulation draws a new term-document matrix from
the posterior for d. Since a topic is simply a multinomial distribution
over the words, for a given document d we simply draw the same number
of samples from this multinomial as there were words allocated to topic
k in d in the model we are checking. Under the assumptions of the
model, this is how the distribution p(w, d|k) arises. With this
simulated topic-specific term-document matrix in hand, we recalculate the MI.
The process is replicated to obtain a reference distribution to compare the
values from mi_topic
to.
a single-row data frame with topic
, mi
, and
deviance
columns. The latter is the MI standardized by the mean and
standard deviation of the simulated values. The vector of simulated values
is available as the "simulated"
attribute of the returned data
frame.
Mimno, D., and Blei, D. 2011. Bayesian Checking for Topic Models. Empirical Methods in Natural Language Processing. http://www.cs.columbia.edu/~blei/papers/MimnoBlei2011.pdf.
imi_check
, mi_topic
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.