recover_counts_from_probs | R Documentation |
This function is a core component of initialize_topic_counts
.
See details, below.
recover_counts_from_probs(prob_matrix, prior_matrix, total_vector)
prob_matrix |
a numeric |
prior_matrix |
a matrix of same dimension as |
total_vector |
a vector of token counts of length |
This function uses a probability matrix (theta or beta), its prior (alpha or eta, respectively), and a vector of counts to simulate what the the Cd or Cv matrix would be at the end of a Gibbs run that resulted in that probability matrix.
For example, theta is calculated from a matrix of counts, Cd, and a prior, alpha. Specifically, the i,j entry of theta is given by
(Cd[i, j] + alpha[i, j]) / sum(Cd[, j] + alpha[, j])
Similarly, beta comes from
(Cv[i, j] + eta[i, j]) / sum(Cv[, j] + eta[, j])
(The above are written to be general with respect to alpha and eta being matrices. They could also be vectors or scalars.)
So, this function uses the above formulas to try and reconstruct Cd or Cv from theta and alpha or beta and eta, respectively. As of this writing, this method is experimental. In the future, there will be a paper with more technical details cited here.
The priors must be matrices for the purposes of the function. This is to support topic seeding and model updates. The former requires eta to be a matrix. The latter may require eta to be a matrix. Here, alpha is also required to be a matrix for compatibility.
All that said, for now initialize_topic_counts
only
uses this function to calculate Cd.
Returns a matrix corresponding to the number of times each topic sampled
for each document (Cd
) or for each token (Cv
) depending on
whether or not prob_matrix
/prior_matrix
corresponds to
theta
/alpha
or beta
/eta
respectively.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.