predictive.distribution | R Documentation |
This function takes a fitted LDA-type model and computes a predictive distribution for new words in a document. This is useful for making predictions about held-out words.
predictive.distribution(document_sums, topics, alpha, eta)
document_sums |
A |
topics |
A |
alpha |
The scalar value of the Dirichlet hyperparameter for topic proportions. See references for details. |
eta |
The scalar value of the Dirichlet hyperparamater for topic multinomials. See references for details. |
The formula used to compute predictive probability is p_d(w) =
\sum_k (\theta_{d, k} + \alpha) (\beta_{w, k} + \eta)
.
A V \times D
matrix of the probability of seeing a word (row) in
a document (column). The row names of the matrix are set to the
column names of topics.
Jonathan Chang (slycoder@gmail.com)
Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.
lda.collapsed.gibbs.sampler
for the format of
topics and document_sums and details of the model.
top.topic.words
demonstrates another use for a fitted
topic matrix.
## Fit a model (from demo(lda)).
data(cora.documents)
data(cora.vocab)
K <- 10 ## Num clusters
result <- lda.collapsed.gibbs.sampler(cora.documents,
K, ## Num clusters
cora.vocab,
25, ## Num iterations
0.1,
0.1)
## Predict new words for the first two documents
predictions <- predictive.distribution(result$document_sums[,1:2],
result$topics,
0.1, 0.1)
## Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)
## [,1] [,2]
## [1,] "learning" "learning"
## [2,] "algorithm" "paper"
## [3,] "model" "problem"
## [4,] "paper" "results"
## [5,] "algorithms" "system"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.