Description Usage Arguments Details Value Note See Also
This implements a Markov chain on (z, π) via the collapsed Gibbs sampling with auxiliary variable updates for the compound latent Dirichlet allocation (cLDA) model.
1 2 3 4 5 | clda_ags_sample_alpha(num_topics, vocab_size, docs_cid, docs_tf, alpha_h,
gamma_h, eta_h, max_iter, burn_in, spacing, save_pi, save_theta, save_beta,
save_lp, verbose, init_pi, test_doc_share = 0, test_word_share = 0,
burn_in_pi = 10L, sample_alpha_h = FALSE, gamma_shape = 1,
gamma_rate = 1)
|
num_topics |
Number of topics in the corpus |
vocab_size |
Vocabulary size |
docs_cid |
Collection ID for each document in the corpus (indices starts 0) |
docs_tf |
Corpus documents read from the Blei corpus format, e.g., via |
alpha_h |
Hyperparameter for π. When |
gamma_h |
Hyperparameter for θ |
eta_h |
Hyperparameter for β |
max_iter |
Maximum number of Gibbs iterations to be performed |
burn_in |
Burn-in-period for the Gibbs sampler |
spacing |
Spacing between the stored samples (to reduce correlation) |
save_pi |
if 0 the function does not save π samples |
save_theta |
if 0 the function does not save θ samples |
save_beta |
if 0 the function does not save β samples |
save_lp |
if 0 the function does not save computed log posterior for iterations |
verbose |
from 0, 1, 2 |
init_pi |
the initial configuration for the collection level topic mixtures, i.e., π samples |
test_doc_share |
proportion of the test documents in the corpus. Must be from [0., 1.) |
test_word_share |
proportion of the test words in each test document. Must be from [0., 1.) |
burn_in_pi |
burn in iterations until pi sampling |
sample_alpha_h |
sample hyperparameter α (true) or not (false) |
gamma_shape |
hyperparameter |
gamma_rate |
hyperparameter |
To compute perplexity, we first partition words in a corpus into two sets:
(a) a test set (held-out set), which is selected from the set of words in
the test (held-out) documents (identified via test_doc_share
and
test_word_share
) and (b) a training set, i.e., the remaining words in
the corpus. We then run the variational EM algorithm based on the training
set. Finally, we compute per-word perplexity based on the held-out set.
A list of
corpus_topic_counts |
corpus-level topic counts from last iteration of the Markov chain |
pi_counts |
collection-level topic counts from the last iteration of the Markov chain |
theta_counts |
document-level topic counts from last iteration of the Markov chain |
beta_counts |
topic word counts from last iteration of the Markov chain |
pi_samples |
π samples after the burn in period, if
|
theta_samples |
θ samples after the burn in period, if
|
beta_samples |
β samples after the burn in period, if
|
log_posterior |
the log posterior (upto a constant multiplier) of
the hidden variable ψ = (β, π, θ, z) in the LDA model,
if |
log_posterior_pi_z |
the log posterior (upto a constant multiplier)
of the hidden variables (π, z) in the LDA model,
if |
perplexity |
perplexity of the held-out words' set |
Updated on: December 17, 2017 – Added hyperparameter alpha sampling
Updated on: June 02, 2016
Created on: May 18, 2016
Created by: Clint P. George
Other MCMC: clda_ags_em
,
clda_ags
, clda_mgs
,
lda_cgs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.