mallet_get_sparse_word_weights_matrix: Get sparse beta matrix
In PolMine/polmineR.topics: tools for topicmodelling

Description Usage Arguments Examples

View source: R/mallet.R

The beta matrix reporting word weights for topics can grow extremely large. The straight-forward ways to get the matrix can be slow and utterly memory inefficient. This function uses the topicXMLReport()-method of the ParallelTopicModel that is the most memory efficient solution we now at this stage. The trick is that weights are only reported for the top N words. Thus you can process the data as as sparse matrix, which is the memory efficient solution. See the examples as a proof that the result is equivalent indeed to the getTopicWords()-method. Note however that the matrix is neither normalized nor smoothed nor algorithmized.

1	mallet_get_sparse_word_weights_matrix(x, n_topics = 50L, destfile = tempfile())

`x`	A `ParallelTopicModel` class object
`n_topics`	A length-one `integer` vector, the number of topics.
`destfile`	Length-one `character` vector, the filename of the output file.

## Not run: 
# x is assumed to be any ParallelTopicModel class object
m <- mallet_get_sparse_word_weights_matrix(x)
beta_sparse <- as.matrix(m)
beta_dense <- rJava::.jevalArray(x$getTopicWords(FALSE, FALSE), simplify = TRUE) 
rownames(beta_dense) <- as.character(1:nrow(beta_dense))


identical(max(beta_sparse[1,]), as.integer(max(beta_dense[1,])))
identical(
  unname(head(beta_sparse[1,][order(beta_sparse[1,], decreasing = TRUE)], 5)),
  as.integer(head(beta_dense[1,][order(beta_dense[1,], decreasing = TRUE)], 5))
)
.fn <- function(x) as.integer(unname(head(x[order(x, decreasing = TRUE)], 50)))
identical(apply(beta_sparse, 1, .fn), apply(beta_dense, 1, .fn))

## End(Not run)