em_plsv: EM posterior maximization for PLSV.
In NathanWycoff/iPLSV:

Description Usage Arguments Value

Perform the EM algorithm described in Iwata et al to estimate a PSLV model.

em_plsv(docs, K, V, P, eta, gamma, beta, make_plot = FALSE,
  THETA_init = NULL, PSI_init = NULL, PHI_init = NULL,
  THETA_fix = list(), PSI_fix = list(), verbose = FALSE, thresh = 0.01,
  max_iters = 1000, lik_grad = "R")

`docs`	A term frequency matrix, that is, one with a row for each document, a column for each vocab word, and integer entries indicating the occurence of a word in a doc.
`K`	The number of topics, an integer scalar.
`V`	The number of unique words, an integer scalar.
`P`	The dimensionality of the embedding space, an integer, usually 2.
`eta`	The exchangible dirichlet prior on words in a topic.
`beta`	The precision for topic locations, a positive scalar.
`make_plot`	A boolean, if TRUE, will make a ggplot visualization of the topics and documents, with topics in red.
`THETA_init`	Either a real matrix with as many rows as docs has and P many columns, giving an initial value for THETA, or a scalar character, either 'smart' or 'random'. See details..
`PSI_init`	Either a real matrix with K many rows and P many columns, giving an initial value for PSI, or a scalar character, either 'smart' or 'random'. See details.
`PHI_init`	Either a matrix of K many V-1-simplex valued rows, giving the initial value for PHI, or a scalar character, either 'smart' or 'random'. See details.
`THETA_fix`	A list of lists, used to fix rows of THETA to a given value. Each sublist has two elements: 'ind' and 'val'. 'ind' Indicates the row, 1-index, of THETA to fix, and 'val', a real valued P-vector, indicates the value to fix it to.
`verbose`	Boolean, if TRUE, tells you what the biggest jump of on screen coords is.
`thresh`	The threshold for the entire EM algo; if the biggest absolute difference between coordinates onscreen is less than this, the algo stops.
`max_iters`	The maximum number of EM iterations allowed, an integer scalar.
`lik_grad`	A character scalar, one of 'R' or 'Cpp'. Functions are written in both languages for the likelihood and gradient. Cpp is much faster. This option will be removed once Cpp is confirmed to work.
`gama`	The precision for document locations, a positive scalar.

A list containing ests, a list with PHI, the topic by document matrix, THETA, the document locations in P-D space, and PSI, the topic locations in P-D space.

NathanWycoff/iPLSV documentation built on May 16, 2019, 11:10 p.m.