| create_lexicon | R Documentation |
One run of the Gibbs sampler and other magic to initialize some objects.
Works in concert with initialize_topic_counts.
create_lexicon(Cd_in, Beta_in, dtm_in, alpha, freeze_topics)
Cd_in |
IntegerMatrix denoting counts of topics in documents |
Beta_in |
NumericMatrix denoting probability of words in topics |
dtm_in |
arma::sp_mat document term matrix |
alpha |
NumericVector prior for topics over documents |
freeze_topics |
bool if making predictions, set to |
Arguments ending in _in are copied and their copies modified in
some way by this function. In the case of Cd_in and Beta_in,
the only modification is that they are converted from matrices to nested
std::vector for speed, reliability, and thread safety. dtm_in
is transposed for speed when looping over columns.
Returns a list with five entries.
Docs is a list of vectors. Each element is a document, and the contents
are indices for tokens. Used as an iterator for the Gibbs sampler.
Zd is a list of vectors, similar to Docs. However, its contents are topic
assignments of each document/token pair. Used as an iterator for Gibbs
sampling.
Cd is a matrix counting the number of times each topic is sampled per
document.
Cv is a matrix counting the number of times each topic is sampled per token.
Ck is a vector counting the total number of times each topic is sampled overall.
Cd, Cv, and Ck are derivatives of Zd.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.