em_plsv: EM posterior maximization for PLSV.

Description Usage Arguments Value

Description

Perform the EM algorithm described in Iwata et al to estimate a PSLV model.

Usage

1
2
3
4
em_plsv(docs, K, V, P, eta, gamma, beta, make_plot = FALSE,
  THETA_init = NULL, PSI_init = NULL, PHI_init = NULL,
  THETA_fix = list(), PSI_fix = list(), verbose = FALSE, thresh = 0.01,
  max_iters = 1000, lik_grad = "R")

Arguments

docs

A term frequency matrix, that is, one with a row for each document, a column for each vocab word, and integer entries indicating the occurence of a word in a doc.

K

The number of topics, an integer scalar.

V

The number of unique words, an integer scalar.

P

The dimensionality of the embedding space, an integer, usually 2.

eta

The exchangible dirichlet prior on words in a topic.

beta

The precision for topic locations, a positive scalar.

make_plot

A boolean, if TRUE, will make a ggplot visualization of the topics and documents, with topics in red.

THETA_init

Either a real matrix with as many rows as docs has and P many columns, giving an initial value for THETA, or a scalar character, either 'smart' or 'random'. See details..

PSI_init

Either a real matrix with K many rows and P many columns, giving an initial value for PSI, or a scalar character, either 'smart' or 'random'. See details.

PHI_init

Either a matrix of K many V-1-simplex valued rows, giving the initial value for PHI, or a scalar character, either 'smart' or 'random'. See details.

THETA_fix

A list of lists, used to fix rows of THETA to a given value. Each sublist has two elements: 'ind' and 'val'. 'ind' Indicates the row, 1-index, of THETA to fix, and 'val', a real valued P-vector, indicates the value to fix it to.

verbose

Boolean, if TRUE, tells you what the biggest jump of on screen coords is.

thresh

The threshold for the entire EM algo; if the biggest absolute difference between coordinates onscreen is less than this, the algo stops.

max_iters

The maximum number of EM iterations allowed, an integer scalar.

lik_grad

A character scalar, one of 'R' or 'Cpp'. Functions are written in both languages for the likelihood and gradient. Cpp is much faster. This option will be removed once Cpp is confirmed to work.

gama

The precision for document locations, a positive scalar.

Value

A list containing ests, a list with PHI, the topic by document matrix, THETA, the document locations in P-D space, and PSI, the topic locations in P-D space.


NathanWycoff/iPLSV documentation built on May 16, 2019, 11:10 p.m.