gen_synth_corpus: Generates a synthetic corpus

Description Usage Arguments Value Examples

View source: R/gen_synth_corpus.R

Description

Generates document words using the LDA generative process given a beta. It's used to test the correctness of the Gibbs sampling algorithms.

Usage

1
gen_synth_corpus(D, lambda.hat, alpha.v, beta)

Arguments

D

the number of documents in the corpus

lambda.hat

the mean of document counts

alpha.v

the vector of Dirichlet hyperparameters (K X 1) for document topic mixtures

beta

the beta matrix (counts) for topic word probabilities (K x V format)

Value

a list of generated documents' details

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
K              <- 2
V              <- 20 
D              <- 100
gen.alpha.v    <- array(7, c(K, 1)); 
gen.eta.v      <- array(3, c(1, V)); 
lambda.hat     <- 80

## Generates the synthetic beta.m
beta.m         <- matrix(1e-2, nrow=K, ncol=V)
beta.m[1, ]    <- rdirichlet(1, gen.eta.v);
beta.m[2, ]    <- rdirichlet(1, gen.eta.v);

## Generates documents with a given beta.m
ds             <- gen_synth_corpus(D, lambda.hat, gen.alpha.v, beta.m);

clintpgeorge/ldamcmc documentation built on Feb. 22, 2020, 12:39 p.m.