Description Usage Arguments Details Examples
Plot probable tokens for a given topic
1 2 | plotTokens(phi = vector(), vocab = character(), n.tokens = 20,
lambda = 0.5, p = vector(), ...)
|
phi |
numeric vector with the probability of each token for a given topic. |
vocab |
character vector of the vocabulary for the corpus |
n.token |
the number of tokens to plot, where the
default is |
lambda |
a parameter between 0 and 1 to control how tokens are ranked within topics |
p |
the marginal probabilities of the tokens in the vocabulary |
... |
additional arguments to the plot() function |
The ranking of tokens within topics is based on a weighted
average of the probability of a token (given the topic) and
the lift, where the lift of a token is defined as the
probability of the token (given the topic) divided by the
marginal probability of the token (i.e. across all topics).
The ranking that determines the top n.token
tokens
to plot is simply lambda * log(p(token)) + (1 -
lambda) * log(p(token | topic)/p(token))
.
Note: the ordering of phi
, vocab
, and
p
must be the same (i.e. the nth element of each
vector must correspond to the same token)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data(APinput)
data(APtopics) #load output instead for demonstration
probs <- getProbs(word.id=APinput$word.id, doc.id=APinput$doc.id, topic.id=APtopics$topics,
vocab=APinput$vocab)
#THE ORDERING OF phi, vocab and p MUST MATCH!
tokens <- factor(APinput$vocab[APinput$word.id], levels=colnames(probs$phi.hat))
token.tab <- table(tokens)
p <- token.tab/sum(token.tab)
plotTokens(phi=probs$phi.hat[1,], vocab=names(p), n.tokens=30, lambda=1/3, p)
# plot all the topics!
## Not run:
for (i in seq_along(probs$phi.hat[,1])) {
plotWords(probs$phi.hat[i,], tokens)
}
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.