Home

/

CRAN

/

maptpx

/

topics: Estimation for Topic Models

topics: Estimation for Topic Models
In maptpx: MAP Estimation of Topic Models

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/topics.R

MAP estimation of Topic models

1 2	topics(counts, K, shape=NULL, initopics=NULL, tol=0.1, bf=FALSE, kill=2, ord=TRUE, verb=1, ...)

`counts`	A matrix of multinomial response counts in `ncol(counts)` phrases/categories for `nrow(counts)` documents/observations. Can be either a simple `matrix` or a `simple_triplet_matrix`.
`K`	The number of latent topics. If `length(K)>1`, `topics` will find the Bayes factor (vs a null single topic model) for each element and return parameter estimates for the highest probability K.
`shape`	Optional argument to specify the Dirichlet prior concentration parameter as `shape` for topic-phrase probabilities. Defaults to `1/(K*ncol(counts))`. For fixed single `K`, this can also be a `ncol(counts)` by `K` matrix of unique shapes for each topic element.
`initopics`	Optional start-location for [θ_1 ... θ_K], the topic-phrase probabilities. Dimensions must accord with the smallest element of `K`. If `NULL`, the initial estimates are built by incrementally adding topics.
`tol`	Convergence tolerance: optimization stops, conditional on some extra checks, when the absolute posterior increase over a full paramater set update is less than `tol`.
`bf`	An indicator for whether or not to calculate the Bayes factor for univariate `K`. If `length(K)>1`, this is ignored and Bayes factors are always calculated.
`kill`	For choosing from multiple `K` numbers of topics (evaluated in increasing order), the search will stop after `kill` consecutive drops in the corresponding Bayes factor. Specify `kill=0` if you want Bayes factors for all elements of `K`.
`ord`	If `TRUE`, the returned topics (columns of `theta`) will be ordered by decreasing usage (i.e., by decreasing `colSums(omega)`).
`verb`	A switch for controlling printed output. `verb > 0` will print something, with the level of detail increasing with `verb`.
`...`	Additional arguments to the undocumented internal `tpx*` functions.

A latent topic model represents each i'th document's term-count vector X_i (with ∑_{j} x_{ij} = m_i total phrase count) as having been drawn from a mixture of K multinomials, each parameterized by topic-phrase probabilities θ_i, such that

X_i \sim MN(m_i, ω_1 θ_1 + ... + ω_Kθ_K).

We assign a K-dimensional Dirichlet(1/K) prior to each document's topic weights [ω_{i1}...ω_{iK}], and the prior on each θ_k is Dirichlet with concentration α. The topics function uses quasi-newton accelerated EM, augmented with sequential quadratic programming for conditional Ω | Θ updates, to obtain MAP estimates for the topic model parameters. We also provide Bayes factor estimation, from marginal likelihood calculations based on a Laplace approximation around the converged MAP parameter estimates. If input length(K)>1, these Bayes factors are used for model selection. Full details are in Taddy (2011).

An topics object list with entries

`K`	The number of latent topics estimated. If input `length(K)>1`, on output this is a single value corresponding to the model with the highest Bayes factor.
`theta`	The `ncol{counts}` by `K` matrix of estimated topic-phrase probabilities.
`omega`	The `nrow{counts}` by `K` matrix of estimated document-topic weights.
`BF`	The log Bayes factor for each number of topics in the input `K`, against a null single topic model.
`D`	Residual dispersion: for each element of `K`, estimated dispersion parameter (which should be near one for the multinomial), degrees of freedom, and p-value for a test of whether the true dispersion is >1.
`X`	The input count matrix, in `dgTMatrix` format.

Estimates are actually functions of the MAP (K-1 or p-1)-dimensional logit transformed natural exponential family parameters.

Matt Taddy mataddy@gmail.com

Taddy (2012), On Estimation and Selection for Topic Models. http://arxiv.org/abs/1109.4518

plot.topics, summary.topics, predict.topics, wsjibm, congress109, we8there

## Simulation Parameters
K <- 10
n <- 100
p <- 100
omega <- t(rdir(n, rep(1/K,K)))
theta <- rdir(K, rep(1/p,p))

## Simulated counts
Q <- omega%*%t(theta)
counts <- matrix(ncol=p, nrow=n)
totals <- rpois(n, 100)
for(i in 1:n){ counts[i,] <- rmultinom(1, size=totals[i], prob=Q[i,]) }

## Bayes Factor model selection (should choose K or nearby)
summary(simselect <- topics(counts, K=K+c(-5:5)), nwrd=0)

## MAP fit for given K
summary( simfit <- topics(counts,  K=K, verb=2), n=0 )

## Adjust for label switching and plot the fit (color by topic)
toplab <- rep(0,K)
for(k in 1:K){ toplab[k] <- which.min(colSums(abs(simfit$theta-theta[,k]))) }
par(mfrow=c(1,2))
tpxcols <- matrix(rainbow(K), ncol=ncol(theta), byrow=TRUE)
plot(theta,simfit$theta[,toplab], ylab="fitted values", pch=21, bg=tpxcols)
plot(omega,simfit$omega[,toplab], ylab="fitted values", pch=21, bg=tpxcols)
title("True vs Fitted Values (color by topic)", outer=TRUE, line=-2)

## The S3 method plot functions
par(mfrow=c(1,2))
plot(simfit, lgd.K=2)
plot(simfit, type="resid")

maptpx documentation built on July 1, 2020, 10:35 p.m.

maptpx index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

maptpx
MAP Estimation of Topic Models

topics: Estimation for Topic Models
In maptpx: MAP Estimation of Topic Models

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to topics in maptpx...

R Package Documentation

Browse R Packages

We want your feedback!

maptpx MAP Estimation of Topic Models

topics: Estimation for Topic Models In maptpx: MAP Estimation of Topic Models

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to topics in maptpx...

R Package Documentation

Browse R Packages

We want your feedback!

maptpx
MAP Estimation of Topic Models

topics: Estimation for Topic Models
In maptpx: MAP Estimation of Topic Models