Description Usage Arguments Value Examples
extracts outputs from LDA model estimated with lda
package by Jonathan Chang
1 | FormatRawLdaOutput(lda_result, docnames, smooth = TRUE, softmax = FALSE)
|
lda_result |
The list value returned by |
docnames |
A character vector giving the names of documents. This is generally rownames(dtm). |
smooth |
Logical. Do you want to smooth your topic proportions so that there is a positive value for each term in each topic? Defaults to TRUE |
softmax |
Logical. Do you want to use the softmax function to normalize raw output? If FALSE (the default) output is normalized using standard sum. |
Returns a list
with two elements: phi
whose rows represent the
distribution of words across a topic and theta
whose rows represent
the distribution of topics across a document.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # Load a pre-formatted dtm and topic model
data(nih_sample_dtm)
# Get a sample of documents
dtm <- nih_sample_dtm[ sample(1:nrow(nih_sample_dtm), 20) , ]
# re-create a character vector of documents from the DTM
lex <- Dtm2Docs(dtm)
# Format for input to lda::lda.collapsed.gibbs.sampler
lex <- lda::lexicalize(lex, vocab=colnames(dtm))
# Fit the model from lda::lda.collapsed.gibbs.sampler
lda <- lda::lda.collapsed.gibbs.sampler(documents = lex, K = 5,
vocab = colnames(dtm),
num.iterations=200,
alpha=0.1, eta=0.05)
# Format the result to get phi and theta matrices
lda <- FormatRawLdaOutput(lda_result=lda, docnames=rownames(dtm), smooth=TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.