divergence: Optimize the number of topics for LDA
In seededlda: Seeded Sequential LDA for Topic Modeling

View source: R/utils.R

divergence

R Documentation

Optimize the number of topics for LDA

Description

divergence() computes the regularized topic divergence scores to help users to find the optimal number of topics for LDA.

Usage

divergence(
  x,
  min_size = 0.01,
  select = NULL,
  regularize = TRUE,
  newdata = NULL,
  ...
)

Arguments

`x`	a LDA model fitted by `textmodel_seededlda()` or `textmodel_lda()`.
`min_size`	the minimum size of topics for regularized topic divergence. Ignored when `regularize = FALSE`.
`select`	names of topics for which the divergence is computed.
`regularize`	if `TRUE`, returns the regularized divergence.
`newdata`	if provided, `theta` and `phi` are estimated through fresh Gibbs sampling.
`...`	additional arguments passed to textmodel_lda.

Details

divergence() computes the average Jensen-Shannon divergence between all the pairs of topic vectors in x$phi. The divergence score maximizes when the chosen number of topic k is optimal (Deveaud et al., 2014). The regularized divergence penalizes topics smaller than min_size to avoid fragmentation (Watanabe & Baturo, forthcoming).

Value

Returns a singple numeric value.

References

Deveaud, Romain et al. (2014). "Accurate and Effective Latent Concept Modeling for Ad Hoc Information Retrieval". doi:10.3166/DN.17.1.61-84. Document Numérique.

Watanabe, Kohei & Baturo, Alexander. (2023). "Seeded Sequential LDA: A Semi-supervised Algorithm for Topic-specific Analysis of Sentences". doi:10.1177/08944393231178605. Social Science Computer Review.

seededlda
Seeded Sequential LDA for Topic Modeling

divergence: Optimize the number of topics for LDA
In seededlda: Seeded Sequential LDA for Topic Modeling

Optimize the number of topics for LDA

Description

Usage

Arguments

Details

Value

References

See Also

Related to divergence in seededlda...

R Package Documentation

Browse R Packages

We want your feedback!

seededlda Seeded Sequential LDA for Topic Modeling

divergence: Optimize the number of topics for LDA In seededlda: Seeded Sequential LDA for Topic Modeling

Optimize the number of topics for LDA

Description

Usage

Arguments

Details

Value

References

See Also

Related to divergence in seededlda...

R Package Documentation

Browse R Packages

We want your feedback!

seededlda
Seeded Sequential LDA for Topic Modeling

divergence: Optimize the number of topics for LDA
In seededlda: Seeded Sequential LDA for Topic Modeling