recover_counts_from_probs: Get Count Matrices from Beta or Theta (and Priors)
In tidylda: Latent Dirichlet Allocation Using 'tidyverse' Conventions

recover_counts_from_probs

R Documentation

Get Count Matrices from Beta or Theta (and Priors)

Description

This function is a core component of initialize_topic_counts. See details, below.

Usage

recover_counts_from_probs(prob_matrix, prior_matrix, total_vector)

Arguments

`prob_matrix`	a numeric `beta` or `theta` matrix
`prior_matrix`	a matrix of same dimension as `prob_matrix` whose entries represent the relevant prior (`alpha` or `eta`)
`total_vector`	a vector of token counts of length `ncol(prob_matrix)`

Details

This function uses a probability matrix (theta or beta), its prior (alpha or eta, respectively), and a vector of counts to simulate what the the Cd or Cv matrix would be at the end of a Gibbs run that resulted in that probability matrix.

For example, theta is calculated from a matrix of counts, Cd, and a prior, alpha. Specifically, the i,j entry of theta is given by

(Cd[i, j] + alpha[i, j]) / sum(Cd[, j] + alpha[, j])

Similarly, beta comes from

(Cv[i, j] + eta[i, j]) / sum(Cv[, j] + eta[, j])

(The above are written to be general with respect to alpha and eta being matrices. They could also be vectors or scalars.)

So, this function uses the above formulas to try and reconstruct Cd or Cv from theta and alpha or beta and eta, respectively. As of this writing, this method is experimental. In the future, there will be a paper with more technical details cited here.

The priors must be matrices for the purposes of the function. This is to support topic seeding and model updates. The former requires eta to be a matrix. The latter may require eta to be a matrix. Here, alpha is also required to be a matrix for compatibility.

All that said, for now initialize_topic_counts only uses this function to calculate Cd.

Value

Returns a matrix corresponding to the number of times each topic sampled for each document (Cd) or for each token (Cv) depending on whether or not prob_matrix/prior_matrix corresponds to theta/alpha or beta/eta respectively.