Description Usage Arguments Value See Also
This implements of the collapsed Gibbs sampler for the LDA model—a Markov
chain on z. To compute perplexity, it first
partitions each document in the corpus into two sets of words: (a) a test
set (held-out set) and (b) a training set, given a user defined
test_set_share
. Then, it runs the Markov chain based on the training
set and computes perplexity for the held-out set.
1 2 | lda_cgs_perplexity(num_topics, vocab_size, docs_tf, alpha_h, eta_h, max_iter,
burn_in, spacing, save_theta, save_beta, save_lp, verbose, test_set_share)
|
num_topics |
Number of topics in the corpus |
vocab_size |
Vocabulary size |
docs_tf |
A list of corpus documents read from the Blei corpus using
|
alpha_h |
Hyperparameter for θ sampling |
eta_h |
Smoothing parameter for the β matrix |
max_iter |
Maximum number of Gibbs iterations to be performed |
burn_in |
Burn-in-period for the Gibbs sampler |
spacing |
Spacing between the stored samples (to reduce correlation) |
save_theta |
if 0 the function does not save θ samples |
save_beta |
if 0 the function does not save β samples |
save_lp |
if 0 the function does not save computed log posterior for iterations |
verbose |
from 0, 1, 2 |
test_set_share |
proportion of the test words in each document. Must be between 0. and 1. |
The Markov chain output as a list of
corpus_topic_counts |
corpus-level topic counts from last iteration of the Markov chain |
theta_counts |
document-level topic counts from last iteration of the Markov chain |
beta_counts |
topic word counts from last iteration of the Markov chain |
theta_samples |
θ samples after the burn in period, if
|
beta_samples |
β samples after the burn in period, if
|
log_posterior |
the log posterior (upto a constant multiplier) of
the hidden variable ψ = (β, θ, z) in the LDA model,
if |
perplexity |
perplexity of the held-out words' set |
Other MCMC: lda_acgs_st
,
lda_cgs_em_perplexity
,
lda_cgs_em
,
lda_fgs_BF_perplexity
,
lda_fgs_perplexity
,
lda_fgs_ppc
,
lda_fgs_st_perplexity
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.