Label topics

Description

Generate a set of words describing each topic from a fitted STM object. Uses a variety of labeling algorithms (see details).

Usage

1
labelTopics(model, topics=NULL, n = 7, frexweight = 0.5)

Arguments

model

An STM model object.

topics

A vector of numbers indicating the topics to include. Default is all topics.

n

The desired number of words (per type) used to label each topic.

frexweight

A weight used in our approximate FREX scoring algorithm (see details).

Details

Four different types of word weightings are printed with label topics.

Highest Prob: are the words within each topic with the highest probability (inferred directly from topic-word distribution parameter β).

FREX: are the words that are both frequent and exclusive, identifying words that distinguish topics. This is calculated by taking the harmonic mean of rank by probability within the topic (frequency) and rank by distribution of topic given word p(z|w=v) (exclusivity). In estimating exclusivity we use a James-Stein type shrinkage estimator of the distribution p(z|w=v).

Score and Lift are measures provided in two other popular text mining packages. For more information on type Score, see the R package lda. For more information on type Lift, see Taddy, "Multinomial Inverse Regression for Text Analysis", Journal of the American Statistical Association 108, 2013 and the R package textir.

Value

A labelTopics object (list)

prob

matrix of highest probability words

frex

matrix of highest ranking frex words

lift

matrix of highest scoring words by lift

score

matrix of best words by score

topicnums

a vector of topic numbers which correspond to the rows

References

Taddy, Matt. "Multinomial inverse regression for text analysis." Journal of the American Statistical Association 108.503 (2013): 755-770.

See Also

stm plot.STM

Examples

1

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.