Collapsed Gibbs Sampling for the Relational Topic Model (RTM).

Share:

Description

Fit a generative topic model which accounts for both the words which occur in a collection of documents as well as the links between the documents.

Usage

1
2
3
4
5
6
7
rtm.collapsed.gibbs.sampler(documents, links, K, vocab, num.iterations,
  alpha, eta, beta, trace = 0L, test.start = length(documents) + 1L)
rtm.em(documents, links, K, vocab, num.e.iterations, num.m.iterations,
        alpha, eta,
        lambda = sum(sapply(links, length))/(length(links) * (length(links) -1)/2),
  initial.beta = rep(3, K), trace = 0L,
  test.start = length(documents) + 1L, tempering = 0.0)

Arguments

documents

A collection of documents in LDA format. See lda.collapsed.gibbs.sampler for details.

links

A list representing the connections between the documents. This list should be of the same length as the documents. Each element, links[[i]], is an integer vector expressing connections between document i and the 0-indexed documents pointed to by the elements of the vector.

K

A scalar integer indicating the number of latent topics for the model.

vocab

A character vector specifying the vocabulary words associated with the word indices used in documents.

num.iterations

The number of sweeps of Gibbs sampling over the entire corpus to make.

num.e.iterations

For rtm.em, the number of iterations in each Gibbs sampling E-step.

num.m.iterations

For rtm.em, the number of M-step iterations.

alpha

The scalar value of the Dirichlet hyperparameter for topic proportions.

eta

The scalar value of the Dirichlet hyperparamater for topic multinomials.

beta

A length K numeric of regression coefficients expressing the relationship between each topic and the probability of link.

lambda

For rtm.em, the regularization parameter used when estimating beta. lambda expresses the number of non-links to simulate among all possible connections between documents.

initial.beta

For rtm.em, an initial value of beta at which to start the EM process.

trace

When trace is greater than zero, diagnostic messages will be output. Larger values of trace imply more messages.

test.start

Internal use only.

tempering

A numeric between 0 and 1 indicating how newly computed parameters should be averaged with the previous iterations parameters. By default, the new values are used directly and the old value discarded. When set to 1, the new values are ignored and the initial values retained indefinitely.

Details

The Relational Topic Model uses LDA to model the content of documents but adds connections between documents as dependent on the similarity of the distribution of latent topic assignments. (See reference for details).

Only the exponential link probability function is implemented here. Note that the collapsed Gibbs sampler is different than the variational inference procedure proposed in the paper and is extremely experimental.

rtm.em provides an EM-wrapper around rtm.collapsed.gibbs.sampler which iteratively estimates the regression parameters beta.

Value

A fitted model as a list with the same components as returned by lda.collapsed.gibbs.sampler.

Author(s)

Jonathan Chang (slycoder@gmail.com)

References

Chang, Jonathan and Blei, David M. Relational Topic Models for Document Networks. Artificial intelligence and statistics. 2009.

See Also

See lda.collapsed.gibbs.sampler for a description of the input formats and similar models.

nubbi.collapsed.gibbs.sampler is a different kind of model for document networks.

predictive.link.probability makes predictions based on the output of this model.

Examples

1
2
3
## See demo.

## Not run: demo(rtm)