mallet.subset.topic.words | R Documentation |
This function returns a matrix of word probabilities for each topic similar to
mallet.topic.words
, but estimated from a subset of the documents
in the corpus. The model assumes that topics are the same no matter where they
are used, but we know this is often not the case. This function lets us test
whether some words are used more or less than we expect in a particular set
of documents.
mallet.subset.topic.words( topic.model, subset.docs, normalized = FALSE, smoothed = FALSE )
topic.model |
A |
subset.docs |
A logical vector of |
normalized |
If |
smoothed |
If |
a number of topics by vocabulary size matrix for the the included documents.
mallet.topic.words
## Not run: # Read in sotu example data data(sotu) sotu.instances <- mallet.import(id.array = row.names(sotu), text.array = sotu[["text"]], stoplist = mallet_stoplist_file_path("en"), token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}") # Create topic model topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1) topic.model$loadDocuments(sotu.instances) # Train topic model topic.model$train(200) # Extract subcorpus topic word matrix post1975_topic_words <- mallet.subset.topic.words(topic.model, sotu[["year"]] > 1975) mallet.top.words(topic.model, word.weights = post1975_topic_words[2,], num.top.words = 5) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.