Description Usage Arguments Details Value Note
This implements the Variational Expectation Maximization (EM) algorithm for the compound latent Dirichlet allocation (cLDA) model.
1 2 3 4 | clda_vem(num_topics, vocab_size, docs_cid, docs_tf, alpha_h, gamma_h, eta_h,
vi_max_iter, em_max_iter, vi_conv_thresh, em_conv_thresh, tau_max_iter,
tau_step_size, estimate_alpha, estimate_gamma, estimate_eta, verbose, init_pi,
test_doc_share = 0, test_word_share = 0)
|
num_topics |
Number of topics in the corpus |
vocab_size |
Vocabulary size |
docs_cid |
Documents collection IDs (ID indices starts 0) |
docs_tf |
A list of corpus documents read from the Blei corpus using
|
alpha_h |
Hyperparameter for collection-level Dirichlets π |
gamma_h |
Hyperparameter for document-level Dirichlets θ |
eta_h |
Hyperparameter for corpus level topic Dirichlets β |
vi_max_iter |
Maximum number of iterations for variational inference |
em_max_iter |
Maximum number of iterations for variational EM |
vi_conv_thresh |
Convergence threshold for the document variational inference loop |
em_conv_thresh |
Convergence threshold for the variational EM loop |
tau_max_iter |
Maximum number of iterations for the constraint Newton updates of τ |
tau_step_size |
the step size for the constraint Newton updates of τ |
estimate_alpha |
If true, run hyperparameter α optimization |
estimate_gamma |
dummy parameter [not implemented] |
estimate_eta |
If true, run hyperparameter η optimization |
verbose |
from 0, 1, 2, 3 |
init_pi |
the initial configuration for the collection level topic mixtures, i.e., π samples |
test_doc_share |
proportion of the test documents in the corpus. Must be from [0., 1.) |
test_word_share |
proportion of the test words in each test document. Must be from [0., 1.) |
To compute perplexity, we first partition words in a corpus into two sets:
(a) a test set (held-out set), which is selected from the set of words in
the test (held-out) documents (identified via test_doc_share
and
test_word_share
) and (b) a training set, i.e., the remaining words in
the corpus. We then run the variational EM algorithm based on the training
set. Finally, we compute per-word perplexity based on the held-out set.
A list of variational parameters
Created on May 13, 2016
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.