dist_from_corpus: Calculate the distance of each topic from the overall corpus...

Description Usage Arguments Value References Examples

View source: R/dist_from_corpus.R

Description

The Hellinger distance between the token probabilities or betas for each topic and the overall probability for the word in the corpus is calculated.

Usage

1
dist_from_corpus(topic_model, dtm_data)

Arguments

topic_model

a fitted topic model object from one of the following: tm-class

dtm_data

a document-term matrix of token counts coercible to simple_triplet_matrix

Value

A vector of distances with length equal to the number of topics in the fitted model

References

Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.

Examples

1
2
3
4
5
# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
dist_from_corpus(lda, AssociatedPress[1:20,])

topicdoc documentation built on Oct. 30, 2019, 11:26 a.m.