Description Usage Arguments Details Value See Also
This implements of the collapsed Gibbs sampler for the LDA model—a Markov chain on z.
1 2 3 | lda_cgs(num_topics, vocab_size, docs_tf, alpha_h, eta_h, max_iter, burn_in,
spacing, save_theta, save_beta, save_lp, verbose, test_doc_share = 0,
test_word_share = 0)
|
num_topics |
Number of topics in the corpus |
vocab_size |
Vocabulary size |
docs_tf |
A list of corpus documents read from the Blei corpus using
|
alpha_h |
Hyperparameter for θ sampling |
eta_h |
Smoothing parameter for the β matrix |
max_iter |
Maximum number of Gibbs iterations to be performed |
burn_in |
Burn-in-period for the Gibbs sampler |
spacing |
Spacing between the stored samples (to reduce correlation) |
save_theta |
if 0 the function does not save θ samples |
save_beta |
if 0 the function does not save β samples |
save_lp |
if 0 the function does not save computed log posterior for iterations |
verbose |
from 0, 1, 2 |
test_doc_share |
proportion of the test documents in the corpus. Must be from [0., 1.) |
test_word_share |
proportion of the test words in each test document. Must be from [0., 1.) |
To compute perplexity, we first partition words in a corpus into two sets:
(a) a test set (held-out set), which is selected from the set of words in
the test (held-out) documents (identified via test_doc_share
and
test_word_share
) and (b) a training set, i.e., the remaining words in
the corpus. We then run the variational EM algorithm based on the training
set. Finally, we compute per-word perplexity based on the held-out set.
The Markov chain output as a list of
corpus_topic_counts |
corpus-level topic counts from last iteration of the Markov chain |
theta_counts |
document-level topic counts from last iteration of the Markov chain |
beta_counts |
topic word counts from last iteration of the Markov chain |
theta_samples |
θ samples after the burn in period, if
|
beta_samples |
β samples after the burn in period, if
|
log_posterior |
the log posterior (upto a constant multiplier) of
the hidden variable ψ = (β, θ, z) in the LDA model,
if |
perplexity |
perplexity of the held-out words' set |
Other MCMC: clda_ags_em
,
clda_ags_sample_alpha
,
clda_ags
, clda_mgs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.