fit_lda_c | R Documentation |
This is the C++ Gibbs sampler for LDA. "Abandon all hope, ye who enter here."
fit_lda_c(
Docs,
Zd_in,
Cd_in,
Cv_in,
Ck_in,
alpha_in,
eta_in,
iterations,
burnin,
optimize_alpha,
calc_likelihood,
Beta_in,
freeze_topics,
threads = 1L,
verbose = TRUE
)
Docs |
List with one element for each document and one entry for each token
as formatted by |
Zd_in |
List with one element for each document and one entry for each token
as formatted by |
Cd_in |
IntegerMatrix denoting counts of topics in documents |
Cv_in |
IntegerMatrix denoting counts of tokens in topics |
Ck_in |
IntegerVector denoting counts of topics across all tokens |
alpha_in |
NumericVector prior for topics over documents |
eta_in |
NumericMatrix for prior of tokens over topics |
iterations |
int number of gibbs iterations to run in total |
burnin |
int number of burn in iterations |
optimize_alpha |
bool do you want to optimize alpha each iteration? |
calc_likelihood |
bool do you want to calculate the log likelihood each iteration? |
Beta_in |
NumericMatrix denoting probability of tokens in topics |
freeze_topics |
bool if making predictions, set to |
threads |
unsigned integer, how many parallel threads? For now, nothing is actually parallel |
verbose |
bool do you want to print out a progress bar? |
Arguments ending in _in
are copied and their copies modified in
some way by this function. In the case of eta_in
and Beta_in
,
the only modification is that they are converted from matrices to nested
std::vector
for speed, reliability, and thread safety. In the case
of all others, they may be explicitly modified during training.
Returns a list with the following entries.
Cd
is a matrix counting the number of times each topic is sampled per
document.
Cv
is a matrix counting the number of times each topic is sampled per token.
Cd_mean
the same as Cd
but values averaged across iterations
greater than burnin
iterations.
Cv_mean
the same as Cv
but values averaged across iterations
greater than burnin
iterations.
Cd_sum
the same as Cd
but values summed across iterations
greater than burnin
iterations.
Cv_sum
the same as Cv
but values summed across iterations
greater than burnin
iterations.
log_likelihood
a matrix with one row indexing iterations and one
row of the log likelihood for each iteration.
alpha
a vector of the document-topic prior
_eta
a matrix of the topic-token prior
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.