Description Usage Arguments Details Examples
The word weights matrix (weights of words for topics) can get big dataish when
there is a large number of topics and a substantially sized vocabulary. The
mallet_save_word_weights and the mallet_load_word_weights are
tools to handle this scenario by writing out the data to disk as a sparse matrix,
and loading this into the R session. In order to be able to use the function,
the ParallelTopicModel class needs to be used, the RTopicModel will
not do it.
1 2 3 | mallet_load_word_weights(filename)
mallet_save_word_weights(model, destfile = tempfile())
|
filename |
A file with word weights. |
model |
A topic model (class |
destfile |
Length-one |
The function mallet_save_word_weights will write a file that
can be handled as a sparse matrix to a file (argument destfile).
Internally, it uses the method printTopicWordWeights of the
ParallelTopicModel class. The (parsed) content of the file is
equivalent to matrix that can be obtained directly the class using the
getTopicWords(FALSE, TRUE) method. Thus, values are not normalised,
but smoothed (= coefficient beta is added to values).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ## Not run:
polmineR::use("polmineR")
speeches <- polmineR::as.speeches("GERMAPARLMINI", s_attribute_name = "speaker")
library(rJava)
.jinit()
.jaddClassPath("/opt/mallet-2.0.8/class") # after .jinit(), not before
.jaddClassPath("/opt/mallet-2.0.8/lib/mallet-deps.jar")
instance_list <- topicanalysis::mallet_make_instance_list(speeches)
instancefile <- mallet_instance_list_store(instance_list)
lda <- mallet::MalletLDA(num.topics = 20)
lda$loadDocuments(instance_list)
lda$setAlphaOptimization(20, 50)
lda$train(100)
# This is the call used internally by 'as_LDA()'. The difference
# is that the arguments of the $getTopicWords()-method are FALSE
# (argument 'normalized') and TRUE (argument 'smoothed')
beta_1 <- rJava::.jevalArray(lda$getTopicWords(FALSE, TRUE), simplify = TRUE)
alphabet <- strsplit(lda$getAlphabet()$toString(), "\n")[[1]]
colnames(beta_1) <- alphabet
beta_1 <- beta_1[, alphabet[order(alphabet)] ]
rownames(beta_1) <- as.character(1:nrow(beta_1))
# This is an approach that uses a (temporary) file written
# to disk. The advantage is that it is a sparse matrix that is
# passed
fname <- mallet_save_word_weights(lda)
word_weights <- mallet_load_word_weights(fname)
beta_2 <- t(as.matrix(word_weights))
# Demonstrate the equivalence of the two approaches
identical(rownames(beta_1), rownames(beta_2))
identical(colnames(beta_1), colnames(beta_2))
identical(apply(beta_1, 1, order), apply(beta_2, 1, order))
identical(beta_1, beta_2)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.