Description Usage Arguments Details Value Note See Also
This implements a Markov chain on (z, π) via the Metropolis adjusted Langevin algorithm within Gibbs sampler (MGS) for the compound latent Dirichlet allocation (cLDA) model.
1 2 3 4 | clda_mgs(num_topics, vocab_size, docs_cid, docs_tf, alpha_h, gamma_h, eta_h,
step_size, max_iter, burn_in, spacing, save_pi, save_theta, save_beta,
save_lp, verbose, init_pi, test_doc_share = 0, test_word_share = 0,
burn_in_pi = 10L)
|
num_topics |
Number of topics in the corpus |
vocab_size |
Vocabulary size |
docs_cid |
Documents collection IDs (ID indices starts 0) |
docs_tf |
A list of corpus documents read from the Blei corpus using
|
alpha_h |
Hyperparameter for π |
gamma_h |
Hyperparameter for θ |
eta_h |
Smoothing parameter for the β matrix |
step_size |
Step size for Langevin update |
max_iter |
Maximum number of Gibbs iterations to be performed |
burn_in |
Burn-in-period for the Gibbs sampler |
spacing |
Spacing between the stored samples (to reduce correlation) |
save_pi |
if 0 the function does not save π samples |
save_theta |
if 0 the function does not save θ samples |
save_beta |
if 0 the function does not save β samples |
save_lp |
if 0 the function does not save computed log posterior for iterations |
verbose |
from 0, 1, 2 |
init_pi |
the initial configuration of the collection level topic mixtures, i.e., π samples |
test_doc_share |
proportion of the test documents in the corpus. Must be from [0., 1.) |
test_word_share |
proportion of the test words in each test document. Must be from [0., 1.) |
To compute perplexity, we first partition words in a corpus into two sets:
(a) a test set (held-out set), which is selected from the set of words in
the test (held-out) documents (identified via test_doc_share
and
test_word_share
) and (b) a training set, i.e., the remaining words in
the corpus. We then run the variational EM algorithm based on the training
set. Finally, we compute per-word perplexity based on the held-out set.
The Markov chain output as a list of
corpus_topic_counts |
corpus-level topic counts from last iteration of the Markov chain |
pi_counts |
collection-level topic counts from the last iteration of the Markov chain |
theta_counts |
document-level topic counts from last iteration of the Markov chain |
beta_counts |
topic word counts from last iteration of the Markov chain |
pi_samples |
π samples after the burn in period, if
|
theta_samples |
θ samples after the burn in period, if
|
beta_samples |
β samples after the burn in period, if
|
log_posterior |
the log posterior (upto a constant multiplier) of
the hidden variable ψ = (β, π, θ, z) in the LDA model,
if |
log_posterior_pi_z |
the log posterior (upto a constant multiplier)
of the hidden variables (π, z) in the LDA model,
if |
perplexity |
perplexity of the held-out words' set |
A Leapling
Modified on: May 18, 2016
Created on: Februray 29, 2016
Created by: Clint P. George
Other MCMC: clda_ags_em
,
clda_ags_sample_alpha
,
clda_ags
, lda_cgs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.