create_lexicon: Make a lexicon for looping over in the gibbs sampler

View source: R/RcppExports.R

create_lexiconR Documentation

Make a lexicon for looping over in the gibbs sampler

Description

One run of the Gibbs sampler and other magic to initialize some objects. Works in concert with initialize_topic_counts.

Usage

create_lexicon(Cd_in, Beta_in, dtm_in, alpha, freeze_topics)

Arguments

Cd_in

IntegerMatrix denoting counts of topics in documents

Beta_in

NumericMatrix denoting probability of words in topics

dtm_in

arma::sp_mat document term matrix

alpha

NumericVector prior for topics over documents

freeze_topics

bool if making predictions, set to TRUE

Details

Arguments ending in _in are copied and their copies modified in some way by this function. In the case of Cd_in and Beta_in, the only modification is that they are converted from matrices to nested std::vector for speed, reliability, and thread safety. dtm_in is transposed for speed when looping over columns.

Value

Returns a list with five entries.

Docs is a list of vectors. Each element is a document, and the contents are indices for tokens. Used as an iterator for the Gibbs sampler.

Zd is a list of vectors, similar to Docs. However, its contents are topic assignments of each document/token pair. Used as an iterator for Gibbs sampling.

Cd is a matrix counting the number of times each topic is sampled per document.

Cv is a matrix counting the number of times each topic is sampled per token.

Ck is a vector counting the total number of times each topic is sampled overall.

Cd, Cv, and Ck are derivatives of Zd.


tidylda documentation built on July 26, 2023, 5:34 p.m.