tw_smooth_normalize | R Documentation |
The "raw" final sampling state of words in topics may be transformed into
either estimated probabilities or other kinds of salience scores. These
methods produce functions that operate on a topic-word matrix. They
can be passed as the weighting
parameter to top_words
.
tw_smooth_normalize(m) tw_smooth(m) tw_blei_lafferty(m) tw_sievert_shirley(m, lambda = 0.6)
m |
a |
lambda |
For |
The basic method (tw_smooth_normalize
) is to recast the sampled word
counts as probabilities by adding the estimated hyperparameter β
and then normalizing rows so they add to 1. This is equivalent to
mallet.topic.words
with smooth
and
normalize
set to TRUE. Naturally this will not change the relative
ordering of words within topics.
tw_smooth
simply adds β but does not normalize.
A method that can re-rank words has been given by Blei and Lafferty: the score for word v in topic t is
p(t,v) log(p(t,v) / ∏_k p(k,v)^(1/K))
where K is the number of topics. The score gives more weight to words which are ranked highly in fewer topics.
Another method is the "relevance" score of Sievert and Shirley: in this case the score is given by
λ log(p(t,v) + (1 - λ) log(p(t,v) / p(v)
where λ is a weighting parameter which is by default set to 0.6 and which determines the amount by which words common in the whole corpus are penalized.
a function of one variable, to be applied to the topic-word sparse matrix.
D. Blei and J. Lafferty. Topic Models. In A. Srivastava and M. Sahami, editors, Text Mining: Classification, Clustering, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, 2009. http://www.cs.princeton.edu/~blei/papers/BleiLafferty2009.pdf.
C. Sievert and K.E. Shirley. LDAvis: A method for visualizing and interpreting topics. http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf.
## Not run: top_words(m, n=10, weighting=tw_blei_lafferty(x)) ## Not run: tw_smooth_normalize(m)(topic_words(m))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.