lda_cgs_em_perplexity: LDA: Gibbs-EM with Perplexity Computation
In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

Description Usage Arguments Details Value See Also

This implements the Gibbs-EM algorithm for LDA that is mentioned in the paper Topic Modeling: Beyond Bag-of-Words. Wallach (2006).

1
2
3

lda_cgs_em_perplexity(num_topics, vocab_size, docs_tf, alpha_h, eta_h,
  em_max_iter, gibbs_max_iter, burn_in, spacing, save_theta, save_beta, save_lp,
  verbose, test_set_share)

`num_topics`	Number of topics in the corpus
`vocab_size`	Vocabulary size
`docs_tf`	A list of corpus documents read from the Blei corpus using `read_docs` (term indices starts with 0)
`alpha_h`	Hyperparameter for θ sampling
`eta_h`	Smoothing parameter for the β matrix
`em_max_iter`	Maximum number of EM iterations to be performed
`gibbs_max_iter`	Maximum number of Gibbs iterations to be performed
`burn_in`	Burn-in-period for the Gibbs sampler
`spacing`	Spacing between the stored samples (to reduce correlation)
`save_theta`	if 0 the function does not save θ samples
`save_beta`	if 0 the function does not save β samples
`save_lp`	if 0 the function does not save computed log posterior for iterations
`verbose`	from 0, 1, 2
`test_set_share`	proportion of the test words in each document. Must be between 0. and 1.

It uses the LDA collapsed Gibbs sampler—a Markov chain on z for the E-step, and Minka (2003) fixed point iterations to optimize h = (η, α) in the M-step. To compute perplexity, it first partitions each document in the corpus into two sets of words: (a) a test set (held-out set) and (b) a training set, given a user defined test_set_share. Then, it runs the Markov chain based on the training set and computes perplexity for the held-out set.

The Markov chain output as a list of

`corpus_topic_counts`	corpus-level topic counts from last iteration of the Markov chain
`theta_counts`	document-level topic counts from last iteration of the Markov chain
`beta_counts`	topic word counts from last iteration of the Markov chain
`theta_samples`	θ samples after the burn in period, if `save_theta` is set
`beta_samples`	β samples after the burn in period, if `save_beta` is set
`log_posterior`	the log posterior (upto a constant multiplier) of the hidden variable ψ = (β, θ, z) in the LDA model, if `save_lp` is set
`perplexity`	perplexity of the held-out words' set

Other MCMC: lda_acgs_st, lda_cgs_em, lda_cgs_perplexity, lda_fgs_BF_perplexity, lda_fgs_perplexity, lda_fgs_ppc, lda_fgs_st_perplexity

clintpgeorge/ldamcmc documentation built on Feb. 22, 2020, 12:39 p.m.

clintpgeorge/ldamcmc index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clintpgeorge/ldamcmc
Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

lda_cgs_em_perplexity: LDA: Gibbs-EM with Perplexity Computation
In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

Description

Usage

Arguments

Details

Value

See Also

Related to lda_cgs_em_perplexity in clintpgeorge/ldamcmc...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/ldamcmc Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

lda_cgs_em_perplexity: LDA: Gibbs-EM with Perplexity Computation In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

Description

Usage

Arguments

Details

Value

See Also

Related to lda_cgs_em_perplexity in clintpgeorge/ldamcmc...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/ldamcmc
Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

lda_cgs_em_perplexity: LDA: Gibbs-EM with Perplexity Computation
In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model