Description Usage Arguments Details Value See Also Examples
View source: R/gen_synth_corpus.R
Generates documents using the LDA generative process based on a set of (α, η) Configurations
1 2 | gen_synth_corpus_multi_alpha(K, V, J, collection.size, doc.size, alpha.vec,
eta.h)
|
K |
number of topics |
V |
vocabulary size |
J |
number of collections |
collection.size |
number of documents in each collection (a list of J elements) |
doc.size |
number of words in each document |
alpha.vec |
a set of values for document-level Dirichlet sampling |
eta.vec |
a set of values for topic Diriclet sampling |
Last modified on: April 15, 2016
a list of generated corpus and their statistics
Other corpus: calc_doc_cos
,
calc_doc_tf
,
gen_synth_corpus2
,
gen_synth_corpus_multi_h
1 2 3 4 5 6 7 8 9 10 11 12 | ## Generates documents with given parameters
J <- 2
K <- 4
V <- 20
alpha.vec <- c(.2, 2)
eta.vec <- .5
doc.size <- 80
collection.size <- c(40, 40) # number of documents in each collection
ds.name <- paste("synth-J", J, "-K", K, "-V", V, sep = "")
ds <- gen_synth_corpus_multi_alpha(K, V, J, collection.size, doc.size, alpha.vec, eta.h)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.