| fit_lda_c | R Documentation |
This is the C++ Gibbs sampler for LDA. "Abandon all hope, ye who enter here."
fit_lda_c(
Docs,
Zd_in,
Cd_in,
Cv_in,
Ck_in,
alpha_in,
eta_in,
iterations,
burnin,
optimize_alpha,
calc_likelihood,
Beta_in,
freeze_topics,
threads = 1L,
verbose = TRUE
)
Docs |
List with one element for each document and one entry for each token
as formatted by |
Zd_in |
List with one element for each document and one entry for each token
as formatted by |
Cd_in |
IntegerMatrix denoting counts of topics in documents |
Cv_in |
IntegerMatrix denoting counts of tokens in topics |
Ck_in |
IntegerVector denoting counts of topics across all tokens |
alpha_in |
NumericVector prior for topics over documents |
eta_in |
NumericMatrix for prior of tokens over topics |
iterations |
int number of gibbs iterations to run in total |
burnin |
int number of burn in iterations |
optimize_alpha |
bool do you want to optimize alpha each iteration? |
calc_likelihood |
bool do you want to calculate the log likelihood each iteration? |
Beta_in |
NumericMatrix denoting probability of tokens in topics |
freeze_topics |
bool if making predictions, set to |
threads |
unsigned integer, how many parallel threads? For now, nothing is actually parallel |
verbose |
bool do you want to print out a progress bar? |
Arguments ending in _in are copied and their copies modified in
some way by this function. In the case of eta_in and Beta_in,
the only modification is that they are converted from matrices to nested
std::vector for speed, reliability, and thread safety. In the case
of all others, they may be explicitly modified during training.
Returns a list with the following entries.
Cd is a matrix counting the number of times each topic is sampled per
document.
Cv is a matrix counting the number of times each topic is sampled per token.
Cd_mean the same as Cd but values averaged across iterations
greater than burnin iterations.
Cv_mean the same as Cv but values averaged across iterations
greater than burnin iterations.
Cd_sum the same as Cd but values summed across iterations
greater than burnin iterations.
Cv_sum the same as Cv but values summed across iterations
greater than burnin iterations.
log_likelihood a matrix with one row indexing iterations and one
row of the log likelihood for each iteration.
alpha a vector of the document-topic prior
_eta a matrix of the topic-token prior
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.