coherence: Coherence of estimated topics

View source: R/functions.R

coherenceR Documentation

Coherence of estimated topics

Description

Computes various coherence based metrics for topic models. It assesses the quality of estimated topics based on co-occurrences of words. For best results, consider cleaning the initial tokens object with padding = TRUE.

Usage

coherence(
  x,
  nWords = 10,
  method = c("C_NPMI", "C_V"),
  window = NULL,
  NPMIs = NULL
)

Arguments

x

a model created from the LDA(), JST() or rJST() function and estimated with grow()

nWords

the number of words in each topic used for evaluation.

method

the coherence method used.

window

optional. If NULL, use the default window for each coherence metric (10 for C_NPMI and 110 for C_V). It is possible to override these default windows by providing an integer or "boolean" to this argument, determining a new window size for all measures. No effect is the NPMIs argument is also provided.

NPMIs

optional NPMI matrix. If provided, skip the computation of NPMI between words, substantially decreasing computing time.

Details

Currently, only C_NPMI and C_V are documented. The implementation follows Röder & al. (2015). For C_NPMI, the sliding window is 10 whereas it is 110 for C_V.

Value

A vector or matrix containing the coherence score of each topic.

Author(s)

Olivier Delmarcelle

References

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 399-–408.


sentopics documentation built on May 18, 2022, 5:05 p.m.