lda_cgs: LDA: Collapsed Gibbs Sampler with Perplexity Computation
In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

Description Usage Arguments Details Value See Also

This implements of the collapsed Gibbs sampler for the LDA model—a Markov chain on z.

1
2
3

lda_cgs(num_topics, vocab_size, docs_tf, alpha_h, eta_h, max_iter, burn_in,
  spacing, save_theta, save_beta, save_lp, verbose, test_doc_share = 0,
  test_word_share = 0)

`num_topics`	Number of topics in the corpus
`vocab_size`	Vocabulary size
`docs_tf`	A list of corpus documents read from the Blei corpus using `read_docs` (term indices starts with 0)
`alpha_h`	Hyperparameter for θ sampling
`eta_h`	Smoothing parameter for the β matrix
`max_iter`	Maximum number of Gibbs iterations to be performed
`burn_in`	Burn-in-period for the Gibbs sampler
`spacing`	Spacing between the stored samples (to reduce correlation)
`save_theta`	if 0 the function does not save θ samples
`save_beta`	if 0 the function does not save β samples
`save_lp`	if 0 the function does not save computed log posterior for iterations
`verbose`	from 0, 1, 2
`test_doc_share`	proportion of the test documents in the corpus. Must be from [0., 1.)
`test_word_share`	proportion of the test words in each test document. Must be from [0., 1.)

To compute perplexity, we first partition words in a corpus into two sets: (a) a test set (held-out set), which is selected from the set of words in the test (held-out) documents (identified via test_doc_share and test_word_share) and (b) a training set, i.e., the remaining words in the corpus. We then run the variational EM algorithm based on the training set. Finally, we compute per-word perplexity based on the held-out set.

The Markov chain output as a list of

`corpus_topic_counts`	corpus-level topic counts from last iteration of the Markov chain
`theta_counts`	document-level topic counts from last iteration of the Markov chain
`beta_counts`	topic word counts from last iteration of the Markov chain
`theta_samples`	θ samples after the burn in period, if `save_theta` is set
`beta_samples`	β samples after the burn in period, if `save_beta` is set
`log_posterior`	the log posterior (upto a constant multiplier) of the hidden variable ψ = (β, θ, z) in the LDA model, if `save_lp` is set
`perplexity`	perplexity of the held-out words' set

Other MCMC: clda_ags_em, clda_ags_sample_alpha, clda_ags, clda_mgs

clintpgeorge/clda documentation built on May 13, 2019, 8 p.m.

clintpgeorge/clda index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clintpgeorge/clda
Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

lda_cgs: LDA: Collapsed Gibbs Sampler with Perplexity Computation
In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

Description

Usage

Arguments

Details

Value

See Also

Related to lda_cgs in clintpgeorge/clda...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/clda Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

lda_cgs: LDA: Collapsed Gibbs Sampler with Perplexity Computation In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

Description

Usage

Arguments

Details

Value

See Also

Related to lda_cgs in clintpgeorge/clda...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/clda
Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model

lda_cgs: LDA: Collapsed Gibbs Sampler with Perplexity Computation
In clintpgeorge/clda: Approximate Inference Algorithms for the Compound Latent Dirichlet Allocation Model