Description Usage Arguments Details Value Note Author(s) References See Also Examples
MAP estimation of Topic models
| 1 2 | 
| counts | A matrix of multinomial response counts in  | 
| K | The number of latent topics.  If  | 
| shape |  Optional argument to specify the Dirichlet prior concentration parameter as  | 
| initopics |  Optional start-location for [θ_1 ... θ_K], the topic-phrase probabilities.  
Dimensions must accord with the smallest element of  | 
| tol |  Convergence tolerance: optimization stops, conditional on some extra checks, when the absolute posterior increase over a full paramater set update is less than  | 
| bf | An indicator for whether or not to calculate the Bayes factor for univariate  | 
| kill | For choosing from multiple  | 
| ord | If  | 
| verb | A switch for controlling printed output.   | 
| ... | Additional arguments to the undocumented internal  | 
A latent topic model represents each i'th document's term-count vector X_i 
(with ∑_{j} x_{ij} = m_i total phrase count)
as having been drawn from a mixture of K multinomials, each parameterized by topic-phrase
probabilities θ_i, such that 
X_i \sim MN(m_i, ω_1 θ_1 + ... + ω_Kθ_K).
We assign a K-dimensional Dirichlet(1/K) prior to each document's topic weights 
[ω_{i1}...ω_{iK}], and the prior on each θ_k is Dirichlet with concentration α.
The topics function uses quasi-newton accelerated EM, augmented with sequential quadratic programming 
for conditional Ω | Θ updates, to obtain MAP estimates for the topic model parameters. 
We also provide Bayes factor estimation, from marginal likelihood
calculations based on a Laplace approximation around the converged MAP parameter estimates.  If  input length(K)>1, these
Bayes factors are used for model selection. Full details are in Taddy (2011). 
An topics object list with entries
| K | The number of latent topics estimated.  If input  | 
| theta | The  | 
| omega | The  | 
| BF | The log Bayes factor for each number of topics in the input  | 
| D | Residual dispersion: for each element of  | 
| X | The input count matrix, in  | 
Estimates are actually functions of the MAP (K-1 or p-1)-dimensional logit transformed natural exponential family parameters.
Matt Taddy mataddy@gmail.com
Taddy (2012), On Estimation and Selection for Topic Models. http://arxiv.org/abs/1109.4518
plot.topics, summary.topics, predict.topics, wsjibm, congress109, we8there
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | ## Simulation Parameters
K <- 10
n <- 100
p <- 100
omega <- t(rdir(n, rep(1/K,K)))
theta <- rdir(K, rep(1/p,p))
## Simulated counts
Q <- omega%*%t(theta)
counts <- matrix(ncol=p, nrow=n)
totals <- rpois(n, 100)
for(i in 1:n){ counts[i,] <- rmultinom(1, size=totals[i], prob=Q[i,]) }
## Bayes Factor model selection (should choose K or nearby)
summary(simselect <- topics(counts, K=K+c(-5:5)), nwrd=0)
## MAP fit for given K
summary( simfit <- topics(counts,  K=K, verb=2), n=0 )
## Adjust for label switching and plot the fit (color by topic)
toplab <- rep(0,K)
for(k in 1:K){ toplab[k] <- which.min(colSums(abs(simfit$theta-theta[,k]))) }
par(mfrow=c(1,2))
tpxcols <- matrix(rainbow(K), ncol=ncol(theta), byrow=TRUE)
plot(theta,simfit$theta[,toplab], ylab="fitted values", pch=21, bg=tpxcols)
plot(omega,simfit$omega[,toplab], ylab="fitted values", pch=21, bg=tpxcols)
title("True vs Fitted Values (color by topic)", outer=TRUE, line=-2)
## The S3 method plot functions
par(mfrow=c(1,2))
plot(simfit, lgd.K=2)
plot(simfit, type="resid")
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.