tf_df_dist: Calculate the distance between token and document frequencies
In topicdoc: Topic-Specific Diagnostics for LDA and CTM Topic Models

tf_df_dist

R Documentation

Calculate the distance between token and document frequencies

Description

Using the the N highest probability tokens for each topic, calculate the Hellinger distance between the token frequencies and the document frequencies

Usage

tf_df_dist(topic_model, dtm_data, top_n_tokens = 10)

Arguments

`topic_model`	a fitted topic model object from one of the following: `tm-class`
`dtm_data`	a document-term matrix of token counts coercible to `simple_triplet_matrix`
`top_n_tokens`	an integer indicating the number of top words to consider, the default is 10

Value

A vector of distances with length equal to the number of topics in the fitted model

References

Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.

Examples


# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
tf_df_dist(lda, AssociatedPress[1:20,])

topicdoc documentation built on July 17, 2022, 1:05 a.m.