# top.topic.words: Get the Top Words and Documents in Each Topic In lda: Collapsed Gibbs Sampling Methods for Topic Models

## Description

This function takes a model fitted using lda.collapsed.gibbs.sampler and returns a matrix of the top words in each topic.

## Usage

 1 2 top.topic.words(topics, num.words = 20, by.score = FALSE) top.topic.documents(document_sums, num.documents = 20, alpha = 0.1) 

## Arguments

 topics For top.topic.words, a K \times V matrix where each entry is a numeric proportional to the probability of seeing the word (column) conditioned on topic (row) (this entry is sometimes denoted β_{w,k} in the literature, see details). The column names should correspond to the words in the vocabulary. The topics field from the output of lda.collapsed.gibbs.sampler can be used. num.words For top.topic.words, the number of top words to return for each topic. document_sums For top.topic.documents, a K \times D matrix where each entry is a numeric proportional to the probability of seeing a topic (row) conditioned on the document (column) (this entry is sometimes denoted θ_{d,k} in the literature, see details). The document_sums field from the output of lda.collapsed.gibbs.sampler can be used. num.documents For top.topic.documents, the number of top documents to return for each topic. by.score If by.score is set to FALSE (default), then words in each topic will be ranked according to probability mass for each word β_{w, k}. If by.score is TRUE, then words will be ranked according to a score defined by β_{w, k} (\log β_{w,k} - 1 / K ∑_{k'} \log β_{w,k'}). alpha

## Value

For top.topic.words, a num.words \times K character matrix where each column contains the top words for that topic.

For top.topic.documents, a num.documents \times K integer matrix where each column contains the top documents for that topic. The entries in the matrix are column-indexed references into document_sums.

## Author(s)

Jonathan Chang ([email protected])

## References

Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.

lda.collapsed.gibbs.sampler for the format of topics.
predictive.distribution demonstrates another use for a fitted topic matrix.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ## From demo(lda). data(cora.documents) data(cora.vocab) K <- 10 ## Num clusters result <- lda.collapsed.gibbs.sampler(cora.documents, K, ## Num clusters cora.vocab, 25, ## Num iterations 0.1, 0.1) ## Get the top words in the cluster top.words <- top.topic.words(result\$topics, 5, by.score=TRUE) ## top.words: ## [,1] [,2] [,3] [,4] [,5] ## [1,] "decision" "network" "planning" "learning" "design" ## [2,] "learning" "time" "visual" "networks" "logic" ## [3,] "tree" "networks" "model" "neural" "search" ## [4,] "trees" "algorithm" "memory" "system" "learning" ## [5,] "classification" "data" "system" "reinforcement" "systems" ## [,6] [,7] [,8] [,9] [,10] ## [1,] "learning" "models" "belief" "genetic" "research" ## [2,] "search" "networks" "model" "search" "reasoning" ## [3,] "crossover" "bayesian" "theory" "optimization" "grant" ## [4,] "algorithm" "data" "distribution" "evolutionary" "science" ## [5,] "complexity" "hidden" "markov" "function" "supported"