mallet_tidiers | R Documentation |
Tidy LDA models fit by the mallet package, which wraps the Mallet topic
modeling package in Java. The arguments and return values
are similar to lda_tidiers()
.
## S3 method for class 'jobjRef' tidy( x, matrix = c("beta", "gamma"), log = FALSE, normalized = TRUE, smoothed = TRUE, ... ) ## S3 method for class 'jobjRef' augment(x, data, ...)
x |
A jobjRef object, of type RTopicModel, such as created
by |
matrix |
Whether to tidy the beta (per-term-per-topic, default) or gamma (per-document-per-topic) matrix. |
log |
Whether beta/gamma should be on a log scale, default FALSE |
normalized |
If true (default), normalize so that each document or word sums to one across the topics. If false, values will be integers representing the actual number of word-topic or document-topic assignments. |
smoothed |
If true (default), add the smoothing parameter to each
to avoid any values being zero. This smoothing parameter is initialized
as |
... |
Extra arguments, not used |
data |
For |
Note that the LDA models from mallet::MalletLDA()
are technically a special case of S4 objects with class jobjRef
.
These are thus implemented as jobjRef
tidiers, with a check for
whether the toString
output is as expected.
augment
must be provided a data argument containing
one row per original document-term pair, such as is returned by
tdm_tidiers, containing columns document
and term
.
It returns that same data with an additional column
.topic
with the topic assignment for that document-term combination.
lda_tidiers()
, mallet::mallet.doc.topics()
,
mallet::mallet.topic.words()
## Not run: library(mallet) library(dplyr) data("AssociatedPress", package = "topicmodels") td <- tidy(AssociatedPress) # mallet needs a file with stop words tmp <- tempfile() writeLines(stop_words$word, tmp) # two vectors: one with document IDs, one with text docs <- td %>% group_by(document = as.character(document)) %>% summarize(text = paste(rep(term, count), collapse = " ")) docs <- mallet.import(docs$document, docs$text, tmp) # create and run a topic model topic_model <- MalletLDA(num.topics = 4) topic_model$loadDocuments(docs) topic_model$train(20) # tidy the word-topic combinations td_beta <- tidy(topic_model) td_beta # Examine the four topics td_beta %>% group_by(topic) %>% top_n(8, beta) %>% ungroup() %>% mutate(term = reorder(term, beta)) %>% ggplot(aes(term, beta)) + geom_col() + facet_wrap(~ topic, scales = "free") + coord_flip() # find the assignments of each word in each document assignments <- augment(topic_model, td) assignments ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.