Predict the response variable of documents using an sLDA model.

Share:

Description

These functions take a fitted sLDA model and predict the value of the response variable (or document-topic sums) for each given document.

Usage

1
2
3
4
5
slda.predict(documents, topics, model, alpha, eta,
num.iterations = 100, average.iterations = 50, trace = 0L)

slda.predict.docsums(documents, topics, alpha, eta,
num.iterations = 100, average.iterations = 50, trace = 0L)

Arguments

documents

A list of document matrices comprising a corpus, in the format described in lda.collapsed.gibbs.sampler.

topics

A K \times V matrix where each entry is an integer that is the number of times the word (column) has been allocated to the topic (row) (a normalised version of this is sometimes denoted β_{w,k} in the literature, see details). The column names should correspond to the words in the vocabulary. The topics field from the output of slda.em can be used.

model

A fitted model relating a document's topic distribution to the response variable. The model field from the output of slda.em can be used.

alpha

The scalar value of the Dirichlet hyperparameter for topic proportions. See references for details.

eta

The scalar value of the Dirichlet hyperparamater for topic multinomials.

num.iterations

Number of iterations of inference to perform on the documents.

average.iterations

Number of samples to average over to produce the predictions.

trace

When trace is greater than zero, diagnostic messages will be output. Larger values of trace imply more messages.

Details

Inference is first performed on the documents by using Gibbs sampling and holding the word-topic matrix β_{w,k} constant. Typically for a well-fit model only a small number of iterations are required to obtain good fits for new documents. These topic vectors are then piped through model to yield numeric predictions associated with each document.

Value

For slda.predict, a numeric vector of the same length as documents giving the predictions. For slda.predict.docsums, a K \times N matrix of document assignment counts.

Author(s)

Jonathan Chang (slycoder@gmail.com)

References

Blei, David M. and McAuliffe, John. Supervised topic models. Advances in Neural Information Processing Systems, 2008.

See Also

See lda.collapsed.gibbs.sampler for a description of the format of the input data, as well as more details on the model.

See predictive.distribution if you want to make predictions about the contents of the documents instead of the response variables.

Examples

1
2
## The sLDA demo shows an example usage of this function.
## Not run: demo(slda)