lda_acgs_st: LDA: Serial Tempering with Perplexity Computation
In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

Description Usage Arguments Value Note See Also

Implements the LDA serial tempering algorithm. Sampling z_{di}'s is adapted from the idea of collapsed Gibbs sampling chain (Griffiths and Steyvers, 2004). To compute perplexity, it first partitions each document in the corpus into two sets of words: (a) a test set (held-out set) and (b) a training set, given a user defined test_set_share. Then, it runs the Markov chain based on the training set and computes perplexity for the held-out set.

lda_acgs_st(num_topics, vocab_size, docs_tf, h_grid, st_grid, st_grid_nbrs,
  init_st_grid_index, zetas, tuning_iter, max_iter_tuning, max_iter_final,
  burn_in, spacing, test_set_share, save_beta, save_theta, save_lp,
  save_hat_ratios, save_tilde_ratios, verbose)

`num_topics`	Number of topics in the corpus
`vocab_size`	Vocabulary size
`docs_tf`	A list of corpus documents read from the Blei corpus using `read_docs` (term indices starts with 0)
`h_grid`	A 2-dimensional grid of hyperparameters h = (η, α). It is a 2 x G matrix, where G is the number of grid points and the first row is for α values and the second row is for η values
`st_grid`	A 2-dimensional grid of hyperparameters h = (η, α). It is a 2 x G matrix, where G is the number of grid points and the first row is for α values and the second row is for η values. This a subgrid on h_grid_ that is used for Serial Tempering
`st_grid_nbrs`	The neighbor indices, from [0, G-1], of each helper grid point
`init_st_grid_index`	Index of the helper h grid, from [1, G], of the initial hyperparameter h = (η, α)
`zetas`	Initial guess for normalization constants
`tuning_iter`	Number of tuning iterations
`max_iter_tuning`	Maximum number of Gibbs iterations to be performed for the tuning iterations
`max_iter_final`	Maximum number of Gibbs iterations to be performed for the final run
`burn_in`	Burn-in-period for the Gibbs sampler
`spacing`	Spacing between the stored samples (to reduce correlation)
`test_set_share`	Proportion of the test words in each document. Must be between 0. and 1.
`save_beta`	If 0 the function does not save β samples
`save_theta`	If 0 the function does not save θ samples
`save_lp`	if 0 The function does not save computed log posterior for iterations
`save_hat_ratios`	If 0 the function does not save hat ratios for iterations
`save_tilde_ratios`	If 0 the function does not save tilde ratios for iterations
`verbose`	Values from 0, 1, 2

A list of

`corpus_topic_counts`	corpus-level topic counts from last iteration of the Markov chain
`theta_counts`	document-level topic counts from last iteration of the Markov chain
`beta_counts`	topic word counts from last iteration of the Markov chain
`theta_samples`	θ samples after the burn in period, if `save_theta` is set
`beta_samples`	β samples after the burn in period, if `save_beta` is set
`log_posterior`	the log posterior (upto a constant multiplier) of the hidden variable ψ = (β, θ, z) in the LDA model, if `save_lp` is set
`perplexity`	perplexity of the held-out words' set

Modifed on:

October 01, 2016 - Created date, adapated from lda_fgs_st.cpp

Other MCMC: lda_cgs_em_perplexity, lda_cgs_em, lda_cgs_perplexity, lda_fgs_BF_perplexity, lda_fgs_perplexity, lda_fgs_ppc, lda_fgs_st_perplexity

clintpgeorge/ldamcmc documentation built on Feb. 22, 2020, 12:39 p.m.

clintpgeorge/ldamcmc index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clintpgeorge/ldamcmc
Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

lda_acgs_st: LDA: Serial Tempering with Perplexity Computation
In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

Description

Usage

Arguments

Value

Note

See Also

Related to lda_acgs_st in clintpgeorge/ldamcmc...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/ldamcmc Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

lda_acgs_st: LDA: Serial Tempering with Perplexity Computation In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

Description

Usage

Arguments

Value

Note

See Also

Related to lda_acgs_st in clintpgeorge/ldamcmc...

R Package Documentation

Browse R Packages

We want your feedback!

clintpgeorge/ldamcmc
Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model

lda_acgs_st: LDA: Serial Tempering with Perplexity Computation
In clintpgeorge/ldamcmc: Markov chain Monte Carlo Algorithms for the Latent Dirichlet Allocation Model