sim_LDA_data: Simulate LDA data from an LDA structure given parameters
In LDATS: Latent Dirichlet Allocation Coupled with Time Series Analyses

sim_LDA_data

R Documentation

Simulate LDA data from an LDA structure given parameters

Description

For a given set of parameters alpha and Beta and document-specific total word counts, simulate a document-by-term matrix. Additional structuring variables (the numbers of topics (k), documents (M), terms (V)) are inferred from input objects.

Usage

sim_LDA_data(N, Beta, alpha = NULL, Theta = NULL, seed = NULL)

Arguments

`N`	A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents.
`Beta`	`matrix` of categorical distribution parameters defining terms within topics. Dimension: k x V (number of topics x number of terms). Used to infer both (k) and (V). Must be non-negative and sum to 1 within topics.
`alpha`	Single positive numeric value for the Dirichlet distribution parameter defining topics within documents. To specifically define document topic probabilities, use `Theta`.
`Theta`	`matrix` of probabilities defining topics within documents. Dimension: M x k (documents x topics). Must be non-negative and sum to 1 within documents. To generally define document topic probabilities, use `alpha`.
`seed`	Input to `set.seed`.

Value

A document-by-term matrix of counts (dim: M x V).

Examples

  N <- c(10, 22, 15, 31)
  alpha <- 1.2
  Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE)
  sim_LDA_data(N, Beta, alpha = alpha)
  Theta <- matrix(c(0.2, 0.8, 0.8, 0.2, 0.5, 0.5, 0.9, 0.1), 4, 2, 
               byrow = TRUE)
  sim_LDA_data(N, Beta, Theta = Theta)

LDATS documentation built on Sept. 19, 2023, 5:08 p.m.