Description Usage Arguments Details Value See Also
This implements the Gibbs-EM algorithm for LDA that is mentioned in the paper Topic Modeling: Beyond Bag-of-Words. Wallach (2006).
1 2 3 | lda_cgs_em_perplexity(num_topics, vocab_size, docs_tf, alpha_h, eta_h,
em_max_iter, gibbs_max_iter, burn_in, spacing, save_theta, save_beta, save_lp,
verbose, test_set_share)
|
num_topics |
Number of topics in the corpus |
vocab_size |
Vocabulary size |
docs_tf |
A list of corpus documents read from the Blei corpus using
|
alpha_h |
Hyperparameter for θ sampling |
eta_h |
Smoothing parameter for the β matrix |
em_max_iter |
Maximum number of EM iterations to be performed |
gibbs_max_iter |
Maximum number of Gibbs iterations to be performed |
burn_in |
Burn-in-period for the Gibbs sampler |
spacing |
Spacing between the stored samples (to reduce correlation) |
save_theta |
if 0 the function does not save θ samples |
save_beta |
if 0 the function does not save β samples |
save_lp |
if 0 the function does not save computed log posterior for iterations |
verbose |
from 0, 1, 2 |
test_set_share |
proportion of the test words in each document. Must be between 0. and 1. |
It uses the LDA collapsed Gibbs sampler—a Markov chain on z for the
E-step, and Minka (2003) fixed point iterations to optimize h = (η,
α) in the M-step. To compute perplexity, it first partitions each
document in the corpus into two sets of words: (a) a test set (held-out set)
and (b) a training set, given a user defined test_set_share
. Then, it
runs the Markov chain based on the training set and computes perplexity for
the held-out set.
The Markov chain output as a list of
corpus_topic_counts |
corpus-level topic counts from last iteration of the Markov chain |
theta_counts |
document-level topic counts from last iteration of the Markov chain |
beta_counts |
topic word counts from last iteration of the Markov chain |
theta_samples |
θ samples after the burn in period, if
|
beta_samples |
β samples after the burn in period, if
|
log_posterior |
the log posterior (upto a constant multiplier) of
the hidden variable ψ = (β, θ, z) in the LDA model,
if |
perplexity |
perplexity of the held-out words' set |
Other MCMC: lda_acgs_st
,
lda_cgs_em
,
lda_cgs_perplexity
,
lda_fgs_BF_perplexity
,
lda_fgs_perplexity
,
lda_fgs_ppc
,
lda_fgs_st_perplexity
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.