word_weights: Process large topic word weights matrices

load_word_weightsR Documentation

Process large topic word weights matrices

Description

The word weights matrix (weights of words for topics) can get big dataish when there is a large number of topics and a substantially sized vocabulary. The 'save_word_weights()' and the 'load_word_weights()' are tools to handle this scenario by writing out the data to disk as a sparse matrix, and loading this into the R session. In order to be able to use the function, the 'ParallelTopicModel' class needs to be used, the 'RTopicModel' will not do it.

Usage

load_word_weights(
  filename,
  minimized = TRUE,
  beta_coeff,
  normalized = TRUE,
  verbose = TRUE
)

save_word_weights(
  model,
  destfile = tempfile(),
  minimized = FALSE,
  verbose = TRUE
)

Arguments

filename

A file with word weights.

minimized

A 'logical' value, whether to print word weights with nonzero values (without smoothing) only.

beta_coeff

As a matter of "smoothing", a coefficient is added to the value oif the matrix. Ideally, state value explicitly in function call. If missing, it will be guessed from the data.

normalized

A 'logical' value, whether to normalize.

verbose

A 'logical' value, whether to output progress messages.

model

A topic model (class 'jobjRef').

destfile

Length-one 'character' vector, the filename of the output file.

Details

The function 'save_word_weights()' will write a file that can be handled as a sparse matrix to a file (argument 'destfile'). Internally, it uses the method '$printTopicWordWeights()' of the 'ParallelTopicModel' class. The (parsed) content of the file is equivalent to matrix that can be obtained directly the class using the '$getTopicWords(FALSE, TRUE)' method. Thus, values are not normalised, but smoothed (= coefficient beta is added to values).

Examples

bin <- system.file(package = "biglda", "extdata", "mallet", "lda_mallet.bin")
lda <- mallet_load_topicmodel(bin)
fname <- save_word_weights(lda)
word_weights <- load_word_weights(fname)

PolMine/biglda documentation built on Feb. 25, 2023, 11:24 p.m.