mallet_get_sparse_word_weights_matrix: Get sparse beta matrix

Description Usage Arguments Examples

View source: R/mallet.R

Description

The beta matrix reporting word weights for topics can grow extremely large. The straight-forward ways to get the matrix can be slow and utterly memory inefficient. This function uses the topicXMLReport()-method of the ParallelTopicModel that is the most memory efficient solution we now at this stage. The trick is that weights are only reported for the top N words. Thus you can process the data as as sparse matrix, which is the memory efficient solution. See the examples as a proof that the result is equivalent indeed to the getTopicWords()-method. Note however that the matrix is neither normalized nor smoothed nor algorithmized.

Usage

1
mallet_get_sparse_word_weights_matrix(x, n_topics = 50L, destfile = tempfile())

Arguments

x

A ParallelTopicModel class object

n_topics

A length-one integer vector, the number of topics.

destfile

Length-one character vector, the filename of the output file.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
# x is assumed to be any ParallelTopicModel class object
m <- mallet_get_sparse_word_weights_matrix(x)
beta_sparse <- as.matrix(m)
beta_dense <- rJava::.jevalArray(x$getTopicWords(FALSE, FALSE), simplify = TRUE) 
rownames(beta_dense) <- as.character(1:nrow(beta_dense))


identical(max(beta_sparse[1,]), as.integer(max(beta_dense[1,])))
identical(
  unname(head(beta_sparse[1,][order(beta_sparse[1,], decreasing = TRUE)], 5)),
  as.integer(head(beta_dense[1,][order(beta_dense[1,], decreasing = TRUE)], 5))
)
.fn <- function(x) as.integer(unname(head(x[order(x, decreasing = TRUE)], 50)))
identical(apply(beta_sparse, 1, .fn), apply(beta_dense, 1, .fn))

## End(Not run)

PolMine/polmineR.topics documentation built on March 6, 2020, 6:03 p.m.