mallet_get_sparse_word_weights_matrix: Get sparse beta matrix
In PolMine/biglda: Fast LDA Topic Modelling for Big Corpora

mallet_get_sparse_word_weights_matrix

R Documentation

Get sparse beta matrix

Description

The beta matrix reporting word weights for topics can grow extremely large. The straight-forward ways to get the matrix can be slow and utterly memory inefficient. This function uses the 'topicXMLReport()'-method of the 'ParallelTopicModel' that is the most memory efficient solution we now at this stage. The trick is that weights are only reported for the top N words. Thus you can process the data as as sparse matrix, which is the memory efficient solution. See the examples as a proof that the result is equivalent indeed to the 'getTopicWords()'-method. Note however that the matrix is neither normalized nor smoothed nor algorithmized.

Usage

mallet_get_sparse_word_weights_matrix(x, n_topics = 50L, destfile = tempfile())

Arguments

`x`	A 'ParallelTopicModel' class object
`n_topics`	A length-one 'integer' vector, the number of topics.
`destfile`	Length-one 'character' vector, the filename of the output file.

Examples

## Not run: 
# x is assumed to be any ParallelTopicModel class object
m <- mallet_get_sparse_word_weights_matrix(x)
beta_sparse <- as.matrix(m)
beta_dense <- rJava::.jevalArray(x$getTopicWords(FALSE, FALSE), simplify = TRUE) 
rownames(beta_dense) <- as.character(1:nrow(beta_dense))


identical(max(beta_sparse[1,]), as.integer(max(beta_dense[1,])))
identical(
  unname(head(beta_sparse[1,][order(beta_sparse[1,], decreasing = TRUE)], 5)),
  as.integer(head(beta_dense[1,][order(beta_dense[1,], decreasing = TRUE)], 5))
)
.fn <- function(x) as.integer(unname(head(x[order(x, decreasing = TRUE)], 50)))
identical(apply(beta_sparse, 1, .fn), apply(beta_dense, 1, .fn))

## End(Not run)

PolMine/biglda documentation built on Feb. 25, 2023, 11:24 p.m.