lda_fgs_perplexity: LDA: Full Gibbs Sampler with Perplexity Computation

Description Usage Arguments Value See Also

View source: R/RcppExports.R

Description

Implements the Full Gibbs sampler for the LDA model—a Markov chain on (β, θ, z). To compute perplexity, it first partitions each document in the corpus into two sets of words: (a) a test set (held-out set) and (b) a training set, given a user defined test_set_share. Then, it runs the Markov chain based on the training set and computes perplexity for the held-out set.

Usage

1
2
lda_fgs_perplexity(num_topics, vocab_size, docs_tf, alpha_h, eta_h, max_iter,
  burn_in, spacing, save_theta, save_beta, save_lp, verbose, test_set_share)

Arguments

num_topics

Number of topics in the corpus

vocab_size

Vocabulary size

docs_tf

A list of corpus documents read from the Blei corpus using read_docs (term indices starts with 0)

alpha_h

Hyperparameter for θ sampling

eta_h

Smoothing parameter for the β matrix

max_iter

Maximum number of Gibbs iterations to be performed

burn_in

Burn-in-period for the Gibbs sampler

spacing

Spacing between the stored samples (to reduce correlation)

save_theta

if 0 the function does not save θ samples

save_beta

if 0 the function does not save β samples

save_lp

if 0 the function does not save computed log posterior for iterations

verbose

from 0, 1, 2

test_set_share

proportion of the test words in each document. Must be between 0. and 1.

Value

The Markov chain output as a list of

corpus_topic_counts

corpus-level topic counts from last iteration of the Markov chain

theta_counts

document-level topic counts from last iteration of the Markov chain

beta_counts

topic word counts from last iteration of the Markov chain

theta_samples

θ samples after the burn in period, if save_theta is set

beta_samples

β samples after the burn in period, if save_beta is set

log_posterior

the log posterior (upto a constant multiplier) of the hidden variable ψ = (β, θ, z) in the LDA model, if save_lp is set

perplexity

perplexity of the held-out words' set

See Also

Other MCMC: lda_acgs_st, lda_cgs_em_perplexity, lda_cgs_em, lda_cgs_perplexity, lda_fgs_BF_perplexity, lda_fgs_ppc, lda_fgs_st_perplexity


clintpgeorge/ldamcmc documentation built on Feb. 22, 2020, 12:39 p.m.