# recover_counts_from_probs: Get Count Matrices from Beta or Theta (and Priors) In tidylda: Latent Dirichlet Allocation Using 'tidyverse' Conventions

## Description

This function is a core component of `initialize_topic_counts`. See details, below.

## Usage

 `1` ```recover_counts_from_probs(prob_matrix, prior_matrix, total_vector) ```

## Arguments

 `prob_matrix` a numeric `beta` or `theta` matrix `prior_matrix` a matrix of same dimension as `prob_matrix` whose entries represent the relevant prior (`alpha` or `eta`) `total_vector` a vector of token counts of length `ncol(prob_matrix)`

## Details

This function uses a probability matrix (theta or beta), its prior (alpha or eta, respectively), and a vector of counts to simulate what the the Cd or Cv matrix would be at the end of a Gibbs run that resulted in that probability matrix.

For example, theta is calculated from a matrix of counts, Cd, and a prior, alpha. Specifically, the i,j entry of theta is given by

`(Cd[i, j] + alpha[i, j]) / sum(Cd[, j] + alpha[, j])`

Similarly, beta comes from

`(Cv[i, j] + eta[i, j]) / sum(Cv[, j] + eta[, j])`

(The above are written to be general with respect to alpha and eta being matrices. They could also be vectors or scalars.)

So, this function uses the above formulas to try and reconstruct Cd or Cv from theta and alpha or beta and eta, respectively. As of this writing, this method is experimental. In the future, there will be a paper with more technical details cited here.

The priors must be matrices for the purposes of the function. This is to support topic seeding and model updates. The former requires eta to be a matrix. The latter may require eta to be a matrix. Here, alpha is also required to be a matrix for compatibility.

All that said, for now `initialize_topic_counts` only uses this function to calculate Cd.

## Value

Returns a matrix corresponding to the number of times each topic sampled for each document (`Cd`) or for each token (`Cv`) depending on whether or not `prob_matrix`/`prior_matrix` corresponds to `theta`/`alpha` or `beta`/`eta` respectively.

tidylda documentation built on Dec. 11, 2021, 10:02 a.m.