Description Usage Arguments Value Note See Also
Implements the LDA serial tempering algorithm. Sampling z_{di}
's
is adapted from the idea of collapsed Gibbs sampling chain (Griffiths and
Steyvers, 2004). To compute perplexity, it first partitions each document in
the corpus into two sets of words:
(a) a test set (held-out set) and
(b) a training set, given a user defined test_set_share.
Then, it runs the Markov chain based on the training set and computes
perplexity for the held-out set.
1 2 3 4 | lda_acgs_st(num_topics, vocab_size, docs_tf, h_grid, st_grid, st_grid_nbrs,
init_st_grid_index, zetas, tuning_iter, max_iter_tuning, max_iter_final,
burn_in, spacing, test_set_share, save_beta, save_theta, save_lp,
save_hat_ratios, save_tilde_ratios, verbose)
|
num_topics |
Number of topics in the corpus |
vocab_size |
Vocabulary size |
docs_tf |
A list of corpus documents read from the Blei corpus using
|
h_grid |
A 2-dimensional grid of hyperparameters h = (η, α). It is a 2 x G matrix, where G is the number of grid points and the first row is for α values and the second row is for η values |
st_grid |
A 2-dimensional grid of hyperparameters h = (η, α). It is a 2 x G matrix, where G is the number of grid points and the first row is for α values and the second row is for η values. This a subgrid on h_grid_ that is used for Serial Tempering |
st_grid_nbrs |
The neighbor indices, from [0, G-1], of each helper grid point |
init_st_grid_index |
Index of the helper h grid, from [1, G], of the initial hyperparameter h = (η, α) |
zetas |
Initial guess for normalization constants |
tuning_iter |
Number of tuning iterations |
max_iter_tuning |
Maximum number of Gibbs iterations to be performed for the tuning iterations |
max_iter_final |
Maximum number of Gibbs iterations to be performed for the final run |
burn_in |
Burn-in-period for the Gibbs sampler |
spacing |
Spacing between the stored samples (to reduce correlation) |
test_set_share |
Proportion of the test words in each document. Must be between 0. and 1. |
save_beta |
If 0 the function does not save β samples |
save_theta |
If 0 the function does not save θ samples |
save_lp |
if 0 The function does not save computed log posterior for iterations |
save_hat_ratios |
If 0 the function does not save hat ratios for iterations |
save_tilde_ratios |
If 0 the function does not save tilde ratios for iterations |
verbose |
Values from 0, 1, 2 |
A list of
corpus_topic_counts |
corpus-level topic counts from last iteration of the Markov chain |
theta_counts |
document-level topic counts from last iteration of the Markov chain |
beta_counts |
topic word counts from last iteration of the Markov chain |
theta_samples |
θ samples after the burn in period, if
|
beta_samples |
β samples after the burn in period, if
|
log_posterior |
the log posterior (upto a constant multiplier) of
the hidden variable ψ = (β, θ, z) in the LDA model,
if |
perplexity |
perplexity of the held-out words' set |
Modifed on:
October 01, 2016 - Created date, adapated from lda_fgs_st.cpp
Other MCMC: lda_cgs_em_perplexity
,
lda_cgs_em
,
lda_cgs_perplexity
,
lda_fgs_BF_perplexity
,
lda_fgs_perplexity
,
lda_fgs_ppc
,
lda_fgs_st_perplexity
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.