MalletLDA | R Documentation |
This function creates a java cc.mallet.topics.RTopicModel object that wraps a
Mallet topic model trainer java object, cc.mallet.topics.ParallelTopicModel.
Note that you can call any of the methods of this java object as properties.
In the example below, I make a call directly to the
topic.model$setAlphaOptimization(20, 50)
java method,
which passes this update to the model itself.
MalletLDA(num.topics = 10, alpha.sum = 5, beta = 0.01)
num.topics |
The number of topics to use. If not specified, this defaults to 10. |
alpha.sum |
This is the magnitude of the Dirichlet prior over the topic distribution of a document. The default value is 5.0. With 10 topics, this setting leads to a Dirichlet with parameter α_k = 0.5. You can intuitively think of this parameter as a number of "pseudo-words", divided evenly between all topics, that are present in every document no matter how the other words are allocated to topics. This is an initial value, which may be changed during training if hyperparameter optimization is active. |
beta |
This is the per-word weight of the Dirichlet prior over topic-word distributions. The magnitude of the distribution (the sum over all words of this parameter) is determined by the number of words in the vocabulary. Again, this value may change due to hyperparameter optimization. |
a cc.mallet.topics.RTopicModel
object
## Not run: # Read in sotu example data data(sotu) sotu.instances <- mallet.import(id.array = row.names(sotu), text.array = sotu[["text"]], stoplist = mallet_stoplist_file_path("en"), token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}") # Create topic model topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1) topic.model$loadDocuments(sotu.instances) # Train topic model topic.model$train(200) # Extract results doc_topics <- mallet.doc.topics(topic.model, smoothed=TRUE, normalized=TRUE) topic_words <- mallet.topic.words(topic.model, smoothed=TRUE, normalized=TRUE) top_words <- mallet.top.words(topic.model, word.weights = topic_words[2,], num.top.words = 5) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.