mallet_tidiers | R Documentation |
Tidy LDA models fit by the mallet package, which wraps the Mallet topic
modeling package in Java. The arguments and return values
are similar to lda_tidiers()
.
## S3 method for class 'jobjRef'
tidy(
x,
matrix = c("beta", "gamma"),
log = FALSE,
normalized = TRUE,
smoothed = TRUE,
...
)
## S3 method for class 'jobjRef'
augment(x, data, ...)
x |
A jobjRef object, of type RTopicModel, such as created
by |
matrix |
Whether to tidy the beta (per-term-per-topic, default) or gamma (per-document-per-topic) matrix. |
log |
Whether beta/gamma should be on a log scale, default FALSE |
normalized |
If true (default), normalize so that each document or word sums to one across the topics. If false, values will be integers representing the actual number of word-topic or document-topic assignments. |
smoothed |
If true (default), add the smoothing parameter to each
to avoid any values being zero. This smoothing parameter is initialized
as |
... |
Extra arguments, not used |
data |
For |
Note that the LDA models from mallet::MalletLDA()
are technically a special case of S4 objects with class jobjRef
.
These are thus implemented as jobjRef
tidiers, with a check for
whether the toString
output is as expected.
augment
must be provided a data argument containing
one row per original document-term pair, such as is returned by
tdm_tidiers, containing columns document
and term
.
It returns that same data with an additional column
.topic
with the topic assignment for that document-term combination.
lda_tidiers()
, mallet::mallet.doc.topics()
,
mallet::mallet.topic.words()
## Not run:
library(mallet)
library(dplyr)
data("AssociatedPress", package = "topicmodels")
td <- tidy(AssociatedPress)
# mallet needs a file with stop words
tmp <- tempfile()
writeLines(stop_words$word, tmp)
# two vectors: one with document IDs, one with text
docs <- td %>%
group_by(document = as.character(document)) %>%
summarize(text = paste(rep(term, count), collapse = " "))
docs <- mallet.import(docs$document, docs$text, tmp)
# create and run a topic model
topic_model <- MalletLDA(num.topics = 4)
topic_model$loadDocuments(docs)
topic_model$train(20)
# tidy the word-topic combinations
td_beta <- tidy(topic_model)
td_beta
# Examine the four topics
td_beta %>%
group_by(topic) %>%
top_n(8, beta) %>%
ungroup() %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta)) +
geom_col() +
facet_wrap(~ topic, scales = "free") +
coord_flip()
# find the assignments of each word in each document
assignments <- augment(topic_model, td)
assignments
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.