top_terms: Weighted top terms

Description Usage Arguments Details Value References Examples

Description

There are multiple ways to reweight topics. The following ones are implemented:

type_probability

p(w|k) The probability of a type given the topic. The most common weighting scheme.

topic_probability

p(k|w) The probability of a topic given a term.

term_score

p(w|k) * log(p(w|k) / (∏ p_k(w|k))^(1/K) A weighting scheme inspired by tf-idf proposed by Lafferty and Blei (2009).

relevance

log(p(w|k)/ (∑ p_k(w)^(1-λ)) A weighting scheme proposed by Sievert and Shirley (2014)

n_wk

n_wk Order by number of topic indicators. Give same result as type_probability but is faster.

Usage

1
top_terms(x, scheme = "type_probability", j = 10, beta = 0, ...)

Arguments

x

A tidy_topic_state

scheme

The weight scheme to use. Default is type_probability.

j

The number of types to return. Default is 10.

beta

Beta hyper parameter. Default is 0 (no prior smoothing).

...

additional parameters used by weighting schemes. See details.

Details

Only returning values for type-topic combination that exist in the model is returned. This means that unless beta is set to 0, the returning probabilities will not sum to 1.

If ties in weight/probability, the original order is returned.

relevance weighting uses the additional parameter lambda. Default is 0.6.

Value

Returns a tibble with topic and top terms and weights

References

Blei, D. M., & Lafferty, J. D. (2009). Topic models. Text mining: classification, clustering, and applications, 10(71), 34.

Sievert, C., & Shirley, K. E. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

Examples

1
2
3
data("sotu50")
w <- top_terms(x = sotu50, "n_wk")
w <- top_terms(x = sotu50, beta = 0.01)

MansMeg/tidytopics documentation built on May 8, 2019, 3:52 p.m.