gen_synth_clda_corpus: Generates a Synthetic c-LDA Corpus

Description Usage Arguments Details Value See Also Examples

Description

Generates documents using the c-LDA generative process based on a set of predefined values.

Usage

1
2
gen_synth_clda_corpus(K, V, J, collection.size, doc.size, alpha.h, gamma.h,
  eta.h)

Arguments

K

number of topics

V

vocabulary size

J

number of collections

collection.size

number of documents in each collection (a list of J elements)

doc.size

number of words in each document

alpha.h

hyperparameter for collection-level Dirichlet sampling

gamma.h

hyperparameter for document-level Dirichlet sampling

eta.h

hyperparameter for topic Diriclet sampling

Details

Last modified on: March 04, 2016

Value

a list of generated corpus and their statistics

See Also

Other corpus: gen_synth_clda_corpus_pi

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Generates documents with given parameters

J                  <- 2
K                  <- 4
V                  <- 20
alpha.h            <- 2
gamma.h            <- .2
eta.h              <- .25
doc.size           <- 80
collection.size    <- c(40, 40) # number of documents in each collection
ds.name            <- paste("synth-J", J, "-K", K, "-D", D, "-V", V, sep = "")

ds                 <- gen_synth_clda_corpus(K, V, J, collection.size, doc.size, alpha.h, gamma.h, eta.h)

clintpgeorge/clda documentation built on May 13, 2019, 8 p.m.