topic_diagnostics: Calculate diagnostics for each topic in a topic model

Description Usage Arguments Value References Examples

View source: R/topic_diagnostics.R

Description

Generate a dataframe containing the diagnostics for each topic in a topic model

Usage

1
2
3
topic_diagnostics(topic_model, dtm_data, top_n_tokens = 10,
  method = c("gamma_threshold", "largest_gamma"),
  gamma_threshold = 0.2)

Arguments

topic_model

a fitted topic model object from one of the following: tm-class

dtm_data

a document-term matrix of token counts coercible to slam_triplet_matrix where each row is a document, each column is a token, and each entry is the frequency of the token in a given document

top_n_tokens

an integer indicating the number of top words to consider for mean token length

method

a string indicating which method to use - "gamma_threshold" or "largest_gamma"

gamma_threshold

a number between 0 and 1 indicating the gamma threshold to be used when using the gamma threshold method, the default is 0.2

Value

A dataframe where each row is a topic and each column contains the associated diagnostic values

References

Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.

Examples

1
2
3
4
5
# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
topic_diagnostics(lda, AssociatedPress[1:20,])

topicdoc documentation built on Oct. 30, 2019, 11:26 a.m.