recover_counts_from_probs: Get Count Matrices from Beta or Theta (and Priors)

Description Usage Arguments Details Value

View source: R/utils.R


This function is a core component of initialize_topic_counts. See details, below.


recover_counts_from_probs(prob_matrix, prior_matrix, total_vector)



a numeric beta or theta matrix


a matrix of same dimension as prob_matrix whose entries represent the relevant prior (alpha or eta)


a vector of token counts of length ncol(prob_matrix)


This function uses a probability matrix (theta or beta), its prior (alpha or eta, respectively), and a vector of counts to simulate what the the Cd or Cv matrix would be at the end of a Gibbs run that resulted in that probability matrix.

For example, theta is calculated from a matrix of counts, Cd, and a prior, alpha. Specifically, the i,j entry of theta is given by

(Cd[i, j] + alpha[i, j]) / sum(Cd[, j] + alpha[, j])

Similarly, beta comes from

(Cv[i, j] + eta[i, j]) / sum(Cv[, j] + eta[, j])

(The above are written to be general with respect to alpha and eta being matrices. They could also be vectors or scalars.)

So, this function uses the above formulas to try and reconstruct Cd or Cv from theta and alpha or beta and eta, respectively. As of this writing, this method is experimental. In the future, there will be a paper with more technical details cited here.

The priors must be matrices for the purposes of the function. This is to support topic seeding and model updates. The former requires eta to be a matrix. The latter may require eta to be a matrix. Here, alpha is also required to be a matrix for compatibility.

All that said, for now initialize_topic_counts only uses this function to calculate Cd.


Returns a matrix corresponding to the number of times each topic sampled for each document (Cd) or for each token (Cv) depending on whether or not prob_matrix/prior_matrix corresponds to theta/alpha or beta/eta respectively.

tidylda documentation built on Dec. 11, 2021, 10:02 a.m.