create_lexicon | R Documentation |
One run of the Gibbs sampler and other magic to initialize some objects.
Works in concert with initialize_topic_counts
.
create_lexicon(Cd_in, Beta_in, dtm_in, alpha, freeze_topics)
Cd_in |
IntegerMatrix denoting counts of topics in documents |
Beta_in |
NumericMatrix denoting probability of words in topics |
dtm_in |
arma::sp_mat document term matrix |
alpha |
NumericVector prior for topics over documents |
freeze_topics |
bool if making predictions, set to |
Arguments ending in _in
are copied and their copies modified in
some way by this function. In the case of Cd_in
and Beta_in
,
the only modification is that they are converted from matrices to nested
std::vector
for speed, reliability, and thread safety. dtm_in
is transposed for speed when looping over columns.
Returns a list with five entries.
Docs
is a list of vectors. Each element is a document, and the contents
are indices for tokens. Used as an iterator for the Gibbs sampler.
Zd
is a list of vectors, similar to Docs. However, its contents are topic
assignments of each document/token pair. Used as an iterator for Gibbs
sampling.
Cd
is a matrix counting the number of times each topic is sampled per
document.
Cv
is a matrix counting the number of times each topic is sampled per token.
Ck
is a vector counting the total number of times each topic is sampled overall.
Cd
, Cv
, and Ck
are derivatives of Zd
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.