Description Usage Arguments Examples
The beta matrix reporting word weights for topics can grow extremely large. The
straight-forward ways to get the matrix can be slow and utterly memory inefficient.
This function uses the topicXMLReport()
-method of the ParallelTopicModel
that is the most memory efficient solution we now at this stage. The trick is that
weights are only reported for the top N words. Thus you can process the data as
as sparse matrix, which is the memory efficient solution. See the examples as a proof
that the result is equivalent indeed to the getTopicWords()
-method. Note
however that the matrix is neither normalized nor smoothed nor algorithmized.
1 | mallet_get_sparse_word_weights_matrix(x, n_topics = 50L, destfile = tempfile())
|
x |
A |
n_topics |
A length-one |
destfile |
Length-one |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Not run:
# x is assumed to be any ParallelTopicModel class object
m <- mallet_get_sparse_word_weights_matrix(x)
beta_sparse <- as.matrix(m)
beta_dense <- rJava::.jevalArray(x$getTopicWords(FALSE, FALSE), simplify = TRUE)
rownames(beta_dense) <- as.character(1:nrow(beta_dense))
identical(max(beta_sparse[1,]), as.integer(max(beta_dense[1,])))
identical(
unname(head(beta_sparse[1,][order(beta_sparse[1,], decreasing = TRUE)], 5)),
as.integer(head(beta_dense[1,][order(beta_dense[1,], decreasing = TRUE)], 5))
)
.fn <- function(x) as.integer(unname(head(x[order(x, decreasing = TRUE)], 50)))
identical(apply(beta_sparse, 1, .fn), apply(beta_dense, 1, .fn))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.