predictLDA: Estimate topics for new documents using a Gibbs sampler

Description Usage Arguments Value Examples

Description

This function estimates topic proportions for a new corpus of documents, using the the vocabulary and the topic-token probability distributions from a previously fit LDA topic model. The function samples the latent topics for each token in the new corpus using a Gibbs sampler, and returns the latent topics from the last iteration.

Usage

1
2
predictLDA(word.id = integer(), doc.id = integer(), k = 10,
  n.chains = 1, n.iter = 1000, topics.init = NULL, alpha = 0.01, phi)

Arguments

word.id

Unique token ID. Can be taken directly from the output of filter.

doc.id

Unique document ID. Can be taken directly from the output of filter.

k

number of topics.

n.chains

number of MCMC chains.

n.iter

number of iterations.

topics.init

A vector of topics to initially assign. The Markov property of MCMC allows one to input the topic assignments from the last iteration of a previous model fit. Note that this vector should be the same length of the word.id vector times the number of chains.

alpha

Dirichlet hyperparameter

phi

The T x W matrix containing the topic-token probability distributions for each of the T topics in the previously fit topic model.

Value

A list of length two. The first element is the sampled latent topic value from the last iteration (for each token). The second element is a vector with the log-likelihood values for every iteration of the Gibbs sampler.

Examples

1
2
3
data(APinput)
#takes a while
## Not run: o <- fitLDA(APinput$word.id, APinput$doc.id, k=20)

kshirley/LDAtools documentation built on May 20, 2019, 7:03 p.m.