read_diagnostics | R Documentation |
Uses the XML package and libxml
to parse the MALLET diagnostic
output.
read_diagnostics(xml_file)
xml_file |
file holding XML to be parsed. |
a list of two dataframes of diagnostic information, topics
and
words
. The diagnostics are sparsely documented by the MALLET source
code (http://hg-iesl.cs.umass.edu/hg/mallet: see
src/cc/mallet/topics/TopicModelDiagnostics.java
).
In topics
, columns include:
topic
The 1-indexed topic number.
corpus_dist
The KL-divergence from the corpus. A useful diagnostic of a topic's distinctiveness.
coherence
The topic coherence measure defined by Mimno et
al., eq. (1): the sum of log-co-document-document frequency ratios for the
top words in the topic. The number of top words is set in the
n_top_words
parameter to write_diagnostics
.
The function attempts to coerce numeric values, which XML extracts as strings, into numbers.
David Mimno et al. Optimizing Semantic Coherence in Topic Models. EMNLP 2011. http://www.cs.princeton.edu/~mimno/papers/mimno-semantic-emnlp.pdf.
write_diagnostics
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.