nubbi.collapsed.gibbs.sampler: Collapsed Gibbs Sampling for the Networks Uncovered By...
In lda: Collapsed Gibbs Sampling Methods for Topic Models

nubbi.collapsed.gibbs.sampler

R Documentation

Collapsed Gibbs Sampling for the Networks Uncovered By Bayesian Inference (NUBBI) Model.

Description

Fit a NUBBI model, which takes as input a collection of entities with corresponding textual descriptions as well as a set of descriptions for pairs of entities. The NUBBI model the produces a latent space description of both the entities and the relationships between them.

Usage

nubbi.collapsed.gibbs.sampler(contexts, pair.contexts, pairs, K.individual,
                              K.pair, vocab, num.iterations, alpha, eta, xi)

Arguments

`contexts`	The set of textual descriptions (i.e., documents) for individual entities in LDA format (see `lda.collapsed.gibbs.sampler` for details).
`pair.contexts`	A set of textual descriptions for pairs of entities, also in LDA format.
`pairs`	Labelings as to which pair each element of `pair.contexts` refer to. This parameter should be an integer matrix with two columns and the same number of rows as `pair.contexts`. The two elements in each row of `pairs` are 0-indexed indices into `contexts` indicating which two entities that element of `pair.contexts` describes. Note that this must be an `integer` and not a `numeric` matrix.
`K.individual`	A scalar integer representing the number of topics for the individual entities.
`K.pair`	A scalar integer representing the number of topics for entity pairs.
`vocab`	A character vector specifying the vocabulary words associated with the word indices used in `contexts` and `pair.contexts`.
`num.iterations`	The number of sweeps of Gibbs sampling over the entire corpus to make.
`alpha`	The scalar value of the Dirichlet hyperparameter for topic proportions.
`eta`	The scalar value of the Dirichlet hyperparamater for topic multinomials.
`xi`	The scalar value of the Dirichlet hyperparamater for source proportions.

Details

The NUBBI model is a switching model wherein the description of each entity-pair can be ascribed to either the first entity of the pair, the second entity of the pair, or their relationship. The NUBBI model posits a latent space (i.e., topic model) over the individual entities, and a different latent space over entity relationships.

The collapsed Gibbs sampler used in this model is different than the variational inference method proposed in the paper and is highly experimental.

Value

A fitted model as a list with the same components as returned by lda.collapsed.gibbs.sampler with the following additional components:

`source_assignments`	A list of `length(pair.contexts)` whose elements `source_assignments[[i]]` are of the same length as `pair.contexts[[i]]` where each entry is either 0 if the sampler assigned the word to the first entity, 1 if the sampler assigned the word to the second entity, or 2 if the sampler assigned the word to the relationship between the two.
`document_source_sums`	A matrix with three columns and `length(pair.contexts)` rows where each row indicates how many words were assigned to the first entity of the pair, the second entity of the pair, and the relationship between the two, respectively.
`document_sums`	Semantically similar to the entry in `lda.collapsed.gibbs.sampler`, except that it is a list whose first `length(contexts)` correspond to the columns of the entry in `lda.collapsed.gibbs.sampler` for the individual contexts, and the remaining `length(pair.contexts)` entries correspond to the columns for the pair contexts.
`topics`	Like the entry in `lda.collapsed.gibbs.sampler`, except that it contains the concatenation of the `K.individual` topics and the `K.pair` topics.

Note

The underlying sampler is quite general and could potentially be used for other models such as the author-topic model (McCallum et al.) and the citation influence model (Dietz et al.). Please examine the source code and/or contact the author(s) for further details.