Use the RTM to predict whether a link exists between two documents.
Description
This function takes a fitted LDAtype model (e.g., LDA or RTM) and makes predictions about the likelihood of a link existing between pairs of documents.
Usage
1  predictive.link.probability(edgelist, document_sums, alpha, beta)

Arguments
edgelist 
A twocolumn integer matrix where each row represents an edge on which to make a prediction. An edge is expressed as a pair of integer indices (1indexed) into the columns (i.e., documents) of document_sums (see below). 
document_sums 
A K \times D matrix where each entry is a numeric proportional
to the probability of seeing a topic (row) conditioned on document
(column) (this entry is sometimes denoted θ_{d,k} in the
literature, see details). The document_sums field or
the document_expects field from the output of

alpha 
The value of the Dirichlet hyperparamter generating the distribution over document_sums. This, in effect, smooths the similarity between documents. 
beta 
A numeric vector of regression weights which is used to determine
the similarity between two vectors (see details). Arguments will be
recycled to create a vector of length 
Details
Whether or not a link exists between two documents i and j
is a function of the weighted inner product of the
document_sums[,i]
and document_sums[,j]
. After
normalizing document_sums
columnwise, this inner
product is weighted by beta.
This quantity is then passed to a
link probability function. Like
rtm.collapsed.gibbs.sampler
in this package, only the
exponential link probability function is supported. Note that
quantities are automatically scaled to be between 0 and 1.
Value
A numeric vector of length dim(edgelist)[1]
, representing the
probability of a link existing between each pair of documents given in
the edge list.
Author(s)
Jonathan Chang (slycoder@gmail.com)
References
Chang, Jonathan and Blei, David M. Relational Topic Models for Document Networks. Artificial intelligence and statistics. 2009.
See Also
rtm.collapsed.gibbs.sampler
for the format of
document_sums. links.as.edgelist
produces values
for edgelist. predictive.distribution
makes
predictions about document content instead.
Examples
1 2 3  ## See demo.
## Not run: demo(rtm)
