FormatRawLdaOutput: Format Raw Output from 'lda.collapsed.gibbs.sampler'
In ChengMengli/topic: Functions for Text Mining and Topic Modeling

Description Usage Arguments Value Examples

extracts outputs from LDA model estimated with lda package by Jonathan Chang

1	FormatRawLdaOutput(lda_result, docnames, smooth = TRUE, softmax = FALSE)

`lda_result`	The list value returned by `lda.collapsed.gibbs.sampler`
`docnames`	A character vector giving the names of documents. This is generally rownames(dtm).
`smooth`	Logical. Do you want to smooth your topic proportions so that there is a positive value for each term in each topic? Defaults to TRUE
`softmax`	Logical. Do you want to use the softmax function to normalize raw output? If FALSE (the default) output is normalized using standard sum.

Returns a list with two elements: phi whose rows represent the distribution of words across a topic and theta whose rows represent the distribution of topics across a document.

# Load a pre-formatted dtm and topic model
data(nih_sample_dtm) 

# Get a sample of documents
dtm <- nih_sample_dtm[ sample(1:nrow(nih_sample_dtm), 20) , ]

# re-create a character vector of documents from the DTM
lex <- Dtm2Docs(dtm)

# Format for input to lda::lda.collapsed.gibbs.sampler
lex <- lda::lexicalize(lex, vocab=colnames(dtm))

# Fit the model from lda::lda.collapsed.gibbs.sampler
lda <- lda::lda.collapsed.gibbs.sampler(documents = lex, K = 5, 
                                         vocab = colnames(dtm), 
                                         num.iterations=200, 
                                         alpha=0.1, eta=0.05)
                                         
# Format the result to get phi and theta matrices                                        
lda <- FormatRawLdaOutput(lda_result=lda, docnames=rownames(dtm), smooth=TRUE)