mallet.doc.topics: Retrieve a matrix of topic weights for every document

View source: R/mallet.R

mallet.doc.topicsR Documentation

Retrieve a matrix of topic weights for every document

Description

This function returns a matrix with one row for every document and one column for every topic.

Usage

mallet.doc.topics(topic.model, normalized = FALSE, smoothed = FALSE)

Arguments

topic.model

A cc.mallet.topics.RTopicModel object created by MalletLDA.

normalized

If TRUE, normalize the rows so that each document sums to one. If FALSE, values will be integers (possibly plus the smoothing constant) representing the actual number of words of each topic in the documents.

smoothed

If TRUE, add the smoothing parameter for the model (initial value specified as alpha.sum in MalletLDA). If FALSE, many values will be zero.

Value

a number of documents by number of topics matrix.

Examples

## Not run: 
# Read in sotu example data
data(sotu)
sotu.instances <-
   mallet.import(id.array = row.names(sotu),
                 text.array = sotu[["text"]],
                 stoplist = mallet_stoplist_file_path("en"),
                 token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)

# Train topic model
topic.model$train(200)

# Extract results
doc_topics <- mallet.doc.topics(topic.model, smoothed=TRUE, normalized=TRUE)
topic_words <- mallet.topic.words(topic.model, smoothed=TRUE, normalized=TRUE)
top_words <- mallet.top.words(topic.model, word.weights = topic_words[2,], num.top.words = 5)

## End(Not run)



mallet documentation built on July 20, 2022, 5:08 p.m.