This function takes a fitted LDAtype model and computes a predictive distribution for new words in a document. This is useful for making predictions about heldout words.
1  predictive.distribution(document_sums, topics, alpha, eta)

document_sums 
A K \times D matrix where each entry is a numeric proportional
to the probability of seeing a topic (row) conditioned on document
(column) (this entry is sometimes denoted θ_{d,k} in the
literature, see details). Either the document_sums field or
the document_expects field from the output of

topics 
A K \times V matrix where each entry is a numeric proportional
to the probability of seeing the word (column) conditioned on topic
(row) (this entry is sometimes denoted β_{w,k} in the
literature, see details). The column names should correspond to the
words in the vocabulary. The topics field from the output of

alpha 
The scalar value of the Dirichlet hyperparameter for topic proportions. See references for details. 
eta 
The scalar value of the Dirichlet hyperparamater for topic multinomials. See references for details. 
The formula used to compute predictive probability is p_d(w) = ∑_k (θ_{d, k} + α) (β_{w, k} + η).
A V \times D matrix of the probability of seeing a word (row) in a document (column). The row names of the matrix are set to the column names of topics.
Jonathan Chang (slycoder@gmail.com)
Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.
lda.collapsed.gibbs.sampler
for the format of
topics and document_sums and details of the model.
top.topic.words
demonstrates another use for a fitted
topic matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  ## Fit a model (from demo(lda)).
data(cora.documents)
data(cora.vocab)
K < 10 ## Num clusters
result < lda.collapsed.gibbs.sampler(cora.documents,
K, ## Num clusters
cora.vocab,
25, ## Num iterations
0.1,
0.1)
## Predict new words for the first two documents
predictions < predictive.distribution(result$document_sums[,1:2],
result$topics,
0.1, 0.1)
## Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)
## [,1] [,2]
## [1,] "learning" "learning"
## [2,] "algorithm" "paper"
## [3,] "model" "problem"
## [4,] "paper" "results"
## [5,] "algorithms" "system"

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.