mallet.topic.hclust | R Documentation |
Returns a hierarchical clustering of topics that can be plotted as a dendrogram.
There are two ways of measuring topic similarity: topics may contain the some of
the same words, or the may appear in some of the same documents. The balance
parameter allows you to interpolate between the similarities determined by these two methods.
mallet.topic.hclust( doc.topics, topic.words, balance = 0.3, method = "euclidean", ... )
doc.topics |
A documents by topics matrix of topic probabilities (see |
topic.words |
A topics by words matrix of word probabilities (see |
balance |
A value between 0.0 (use only document-level similarity) and 1.0 (use only word-level similarity). |
method |
method to use in |
... |
Further arguments for |
An object of class hclust
which describes the tree produced by the clustering process.
This function uses data matrices from mallet.doc.topics
and mallet.topic.words
using the hclust
function.
## Not run: # Read in sotu example data data(sotu) sotu.instances <- mallet.import(id.array = row.names(sotu), text.array = sotu[["text"]], stoplist = mallet_stoplist_file_path("en"), token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}") # Create topic model topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1) topic.model$loadDocuments(sotu.instances) # Train topic model topic.model$train(200) # Create hiearchical clusters of topics doc_topics <- mallet.doc.topics(topic.model, smoothed=TRUE, normalized=TRUE) topic_words <- mallet.topic.words(topic.model, smoothed=TRUE, normalized=TRUE) topic_labels <- mallet.topic.labels(topic.model) plot(mallet.topic.hclust(doc_topics, topic_words, balance = 0.3), labels=topic_labels) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.