predictive.link.probability: Use the RTM to predict whether a link exists between two...
In lda: Collapsed Gibbs Sampling Methods for Topic Models

predictive.link.probability

R Documentation

Use the RTM to predict whether a link exists between two documents.

Description

This function takes a fitted LDA-type model (e.g., LDA or RTM) and makes predictions about the likelihood of a link existing between pairs of documents.

Usage

predictive.link.probability(edgelist, document_sums, alpha, beta)

Arguments

`edgelist`	A two-column integer matrix where each row represents an edge on which to make a prediction. An edge is expressed as a pair of integer indices (1-indexed) into the columns (i.e., documents) of `document_sums` (see below).
`document_sums`	A `K \times D` matrix where each entry is a numeric proportional to the probability of seeing a topic (row) conditioned on document (column) (this entry is sometimes denoted `\theta_{d,k}` in the literature, see details). The `document_sums` field or the `document_expects` field from the output of `lda.collapsed.gibbs.sampler` and `rtm.collapsed.gibbs.sampler` can be used.
`alpha`	The value of the Dirichlet hyperparamter generating the distribution over `document_sums`. This, in effect, smooths the similarity between documents.
`beta`	A numeric vector of regression weights which is used to determine the similarity between two vectors (see details). Arguments will be recycled to create a vector of length `dim(document_sums)[1]`.

Details

Whether or not a link exists between two documents i and j is a function of the weighted inner product of the document_sums[,i] and document_sums[,j]. After normalizing document_sums column-wise, this inner product is weighted by beta.

This quantity is then passed to a link probability function. Like rtm.collapsed.gibbs.sampler in this package, only the exponential link probability function is supported. Note that quantities are automatically scaled to be between 0 and 1.