mi_check: Posterior predictive checking for topics

mi_checkR Documentation

Posterior predictive checking for topics

Description

This function provides a way to check the fit of the topic model by comparing the obtained mutual information for topics to values derived from simulations from the posterior. Large deviations from simulated values may indicate a poorer fit.

Usage

mi_check(m, k, groups = NULL, n_reps = 20)

Arguments

m

mallet_model object with sampling state loaded via load_sampling_state

k

topic number (calculations are only done for one topic at a time)

groups

optional grouping factor for documents. If supplied, the IMI values will be for words over groups rather than over individual documents

n_reps

number of simulations

Details

For a given topic k, a simulation draws a new term-document matrix from the posterior for d. Since a topic is simply a multinomial distribution over the words, for a given document d we simply draw the same number of samples from this multinomial as there were words allocated to topic k in d in the model we are checking. Under the assumptions of the model, this is how the distribution p(w, d|k) arises. With this simulated topic-specific term-document matrix in hand, we recalculate the MI. The process is replicated to obtain a reference distribution to compare the values from mi_topic to.

Value

a single-row data frame with topic, mi, and deviance columns. The latter is the MI standardized by the mean and standard deviation of the simulated values. The vector of simulated values is available as the "simulated" attribute of the returned data frame.

References

Mimno, D., and Blei, D. 2011. Bayesian Checking for Topic Models. Empirical Methods in Natural Language Processing. http://www.cs.columbia.edu/~blei/papers/MimnoBlei2011.pdf.

See Also

imi_check, mi_topic


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.