dist_from_corpus: Calculate the distance of each topic from the overall corpus...
In topicdoc: Topic-Specific Diagnostics for LDA and CTM Topic Models

dist_from_corpus

R Documentation

Calculate the distance of each topic from the overall corpus token distribution

Description

The Hellinger distance between the token probabilities or betas for each topic and the overall probability for the word in the corpus is calculated.

Usage

dist_from_corpus(topic_model, dtm_data)

Arguments

`topic_model`	a fitted topic model object from one of the following: `tm-class`
`dtm_data`	a document-term matrix of token counts coercible to `simple_triplet_matrix`

Value

A vector of distances with length equal to the number of topics in the fitted model

References

Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.

Examples


# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
dist_from_corpus(lda, AssociatedPress[1:20,])

topicdoc documentation built on July 17, 2022, 1:05 a.m.