calclift: Calculate Lift Words

View source: R/STMfunctions.R

calcliftR Documentation

Calculate Lift Words

Description

A primarily internal function for calculating words according to the lift metric. We expect most users will use labelTopics instead.

Usage

calclift(logbeta, wordcounts)

Arguments

logbeta

a K by V matrix containing the log probabilities of seeing word v conditional on topic k

wordcounts

a V length vector indicating the number of times each word appears in the corpus.

Details

Lift is the calculated by dividing the topic-word distribution by the empirical word count probability distribution. In other words the Lift for word v in topic k can be calculated as:

Lift = \beta_{k,v}/(w_v/\sum_v w_v)

We include this after seeing it used effectively in Matt Taddy's work including his excellent maptpx package. Definitions are given in Taddy(2012).

References

Taddy, Matthew. 2012. "On Estimation and Selection for Topic Models." AISTATS JMLR W&CP 22

See Also

labelTopics


bstewart/stm documentation built on Jan. 3, 2024, 6:58 p.m.