mi_topic | R Documentation |
Calculates the mutual information of words and documents within a given topic. This measures the degree to which the estimated distribution over words within the topic violates the assumption that it is independent of the distribution of words over documents.
mi_topic(m, k, groups = NULL)
m |
|
k |
topic number (calculations are only done for one topic at a time) |
groups |
optional grouping factor for documents. If omitted, the MI over documents is calculated. |
The mutual information is given by
MI(W, D|K=k) = ∑_{w, d} p(w, d|k) \log\frac{p(w, d|k)}{p(w|k) p(d|k)}
In the limit of true independence, the fraction in the log is one and the MI is zero. In general, we can rewrite the sum as
∑_d p(d|k) ∑_w p(w|d, k) \log\frac{p(w|d, k)}{p(w|k)}
which is E_D(KL(W|d, W), the expected divergence of the conditional distribution from the marginal distribution. It can be shown with some algebra that
MI(W, D|k) = ∑_{w} p(w|k) IMI(w|k)
where the IMI is defined as specified in the Details for
imi_topic
. This is the formula used for calculation here.
We can replace D with a grouping over documents and the formulas carry over without further change, now expressing the mutual information of those groupings and words within the topic.
a single value, giving the estimated mutual information.
imi_topic
, imi_check
,
mi_check
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.