tw_smooth_normalize: Scoring methods for words in topics
In agoldst/dfrtopics: Tools for exploring topic models of text

tw_smooth_normalize

R Documentation

Scoring methods for words in topics

Description

The "raw" final sampling state of words in topics may be transformed into either estimated probabilities or other kinds of salience scores. These methods produce functions that operate on a topic-word matrix. They can be passed as the weighting parameter to top_words.

Usage

tw_smooth_normalize(m)

tw_smooth(m)

tw_blei_lafferty(m)

tw_sievert_shirley(m, lambda = 0.6)

Arguments

`m`	a `mallet_model` object
`lambda`	For `sievert_shirley`, the weighting parameter λ, by default 0.6.

Details

The basic method (tw_smooth_normalize) is to recast the sampled word counts as probabilities by adding the estimated hyperparameter β and then normalizing rows so they add to 1. This is equivalent to mallet.topic.words with smooth and normalize set to TRUE. Naturally this will not change the relative ordering of words within topics.

tw_smooth simply adds β but does not normalize.

A method that can re-rank words has been given by Blei and Lafferty: the score for word v in topic t is

p(t,v) log(p(t,v) / ∏_k p(k,v)^(1/K))

where K is the number of topics. The score gives more weight to words which are ranked highly in fewer topics.

Another method is the "relevance" score of Sievert and Shirley: in this case the score is given by

λ log(p(t,v) + (1 - λ) log(p(t,v) / p(v)

where λ is a weighting parameter which is by default set to 0.6 and which determines the amount by which words common in the whole corpus are penalized.

Value

a function of one variable, to be applied to the topic-word sparse matrix.

References

D. Blei and J. Lafferty. Topic Models. In A. Srivastava and M. Sahami, editors, Text Mining: Classification, Clustering, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, 2009. http://www.cs.princeton.edu/~blei/papers/BleiLafferty2009.pdf.

C. Sievert and K.E. Shirley. LDAvis: A method for visualizing and interpreting topics. http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf.

Examples

## Not run: top_words(m, n=10, weighting=tw_blei_lafferty(x))
## Not run: tw_smooth_normalize(m)(topic_words(m))

agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.

agoldst/dfrtopics index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

agoldst/dfrtopics
Tools for exploring topic models of text

tw_smooth_normalize: Scoring methods for words in topics
In agoldst/dfrtopics: Tools for exploring topic models of text

Scoring methods for words in topics

Description

Usage

Arguments

Details

Value

References

Examples

Related to tw_smooth_normalize in agoldst/dfrtopics...

R Package Documentation

Browse R Packages

We want your feedback!

agoldst/dfrtopics Tools for exploring topic models of text

tw_smooth_normalize: Scoring methods for words in topics In agoldst/dfrtopics: Tools for exploring topic models of text

Scoring methods for words in topics

Description

Usage

Arguments

Details

Value

References

Examples

Related to tw_smooth_normalize in agoldst/dfrtopics...

R Package Documentation

Browse R Packages

We want your feedback!

agoldst/dfrtopics
Tools for exploring topic models of text

tw_smooth_normalize: Scoring methods for words in topics
In agoldst/dfrtopics: Tools for exploring topic models of text