topic_stability: Compute topic stability for over-optimal topic specifications

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/topic_stability.R

Description

Implements fast chi-square like test to evaluate the stability of redundant topics.

Usage

1
2
3
4
5
6
7
topic_stability(
  lda_models,
  optimal_model,
  q = 0.8,
  alpha = 0.05,
  do_plot = TRUE
)

Arguments

lda_models

A list of ordered LDA models as estimated by LDA. The LDA models must be in ascending order according to the number of topics.

optimal_model

A number corresponding to the optimal topic model.

q

Set a cutoff for important words as the quantile of the expected cumulative probability of word weights. Default to 0.80, meaning that the function reaches 80% of the distribution mass and leaves out the remaining 20%.

alpha

Alpha level to identify informative words from the Cumulative Distribution Function over the cosine similarities in the Topic Word Weights matrix. Default to 0.05.

do_plot

Plot the chi-square statistic as a function of the number of topics. Default to TRUE. The horizontal dot-dashed line represents the significance level according to alpha.

Details

This function implements Test 3 as defined in Lewis and Grossetti (2019). Test 3 evaluates the aggregated stability of over-optimal topic specifications by summing each point-wise contribution. See 'Value' to understand how topic_stability returns the results.

Value

A 'data.table' containing the following columns:

topic

An integer giving the number of topics.

df

An integer giving the degrees of freedom.

chisq

A numeric giving the chi-square statistic.

Author(s)

Francesco Grossetti francesco.grossetti@unibocconi.it

Craig M. Lewis craig.lewis@owen.vanderbilt.edu

References

Lewis, C. and Grossetti, F. (2019 - forthcoming):
A Statistical Approach for Optimal Topic Model Identification.

See Also

LDA data.table

Examples

1
2
3
4
5
6
7
## Not run: 
test2 <- topic_stability( lda_models = lda_list,
                          optimal_model = test1,
                          q = 0.00075,
                          alpha = 0.05 )

## End(Not run)

contefranz/OpTop documentation built on Feb. 14, 2022, 7:04 p.m.