topicCoherence: Calculating Topic Coherence

Description Usage Arguments Value References Examples

View source: R/topicCoherence.R

Description

Implementationof Mimno's topic coherence.

Usage

1
2
3
4
5
6
7
8
topicCoherence(
  ldaresult,
  documents,
  num.words = 10,
  by.score = TRUE,
  sym.coherence = FALSE,
  epsilon = 1
)

Arguments

ldaresult

The result of a function call LDAgen

documents

A list prepared by LDAprep.

num.words

Integer: Number of topwords used for calculating topic coherence (default: 10).

by.score

Logical: Should the Score from top.topic.words be used (default: TRUE)?

sym.coherence

Logical: Should a symmetric version of the topic coherence used for the calculations? If TRUE the denominator of the topic coherence uses both wordcounts and not just one.

epsilon

Numeric: Smoothing factor to avoid log(0). Default is 1. Stevens et al. recommend a smaller value.

Value

A vector of topic coherences. the length of the vector corresponds to the number of topics in the model.

References

Mimno, David and Wallach, Hannah M. and Talley, Edmund and Leenders, Miriam and McCallum, Andrew. Optimizing semantic coherence in topic models. EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011. Stevens, Keith and Andrzejewski, David and Buttler, David. Exploring topic coherence over many models and many topics. EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

result <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
topicCoherence(ldaresult=result, documents=ldaPrep, num.words=5, by.score=TRUE)

tosca documentation built on Oct. 28, 2021, 5:07 p.m.