Description Usage Arguments Details Value Author(s) References See Also Examples
This function takes a fitted LDA-type model and computes a predictive distribution for new words in a document. This is useful for making predictions about held-out words.
1 | predictive.distribution(document_sums, topics, alpha, eta)
|
document_sums |
A K \times D matrix where each entry is a numeric proportional
to the probability of seeing a topic (row) conditioned on document
(column) (this entry is sometimes denoted θ_{d,k} in the
literature, see details). Either the document_sums field or
the document_expects field from the output of
|
topics |
A K \times V matrix where each entry is a numeric proportional
to the probability of seeing the word (column) conditioned on topic
(row) (this entry is sometimes denoted β_{w,k} in the
literature, see details). The column names should correspond to the
words in the vocabulary. The topics field from the output of
|
alpha |
The scalar value of the Dirichlet hyperparameter for topic proportions. See references for details. |
eta |
The scalar value of the Dirichlet hyperparamater for topic multinomials. See references for details. |
The formula used to compute predictive probability is p_d(w) = ∑_k (θ_{d, k} + α) (β_{w, k} + η).
A V \times D matrix of the probability of seeing a word (row) in a document (column). The row names of the matrix are set to the column names of topics.
Jonathan Chang (slycoder@gmail.com)
Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.
lda.collapsed.gibbs.sampler
for the format of
topics and document_sums and details of the model.
top.topic.words
demonstrates another use for a fitted
topic matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ## Fit a model (from demo(lda)).
data(cora.documents)
data(cora.vocab)
K <- 10 ## Num clusters
result <- lda.collapsed.gibbs.sampler(cora.documents,
K, ## Num clusters
cora.vocab,
25, ## Num iterations
0.1,
0.1)
## Predict new words for the first two documents
predictions <- predictive.distribution(result$document_sums[,1:2],
result$topics,
0.1, 0.1)
## Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)
## [,1] [,2]
## [1,] "learning" "learning"
## [2,] "algorithm" "paper"
## [3,] "model" "problem"
## [4,] "paper" "results"
## [5,] "algorithms" "system"
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.