Description Usage Arguments Details Examples
The word weights matrix (weights of words for topics) can get big dataish when
there is a large number of topics and a substantially sized vocabulary. The
mallet_save_word_weights
and the mallet_load_word_weights
are
tools to handle this scenario by writing out the data to disk as a sparse matrix,
and loading this into the R session. In order to be able to use the function,
the ParallelTopicModel
class needs to be used, the RTopicModel
will
not do it.
1 2 3 | mallet_load_word_weights(filename)
mallet_save_word_weights(model, destfile = tempfile())
|
filename |
A file with word weights. |
model |
A topic model (class |
destfile |
Length-one |
The function mallet_save_word_weights
will write a file that
can be handled as a sparse matrix to a file (argument destfile
).
Internally, it uses the method printTopicWordWeights
of the
ParallelTopicModel
class. The (parsed) content of the file is
equivalent to matrix that can be obtained directly the class using the
getTopicWords(FALSE, TRUE)
method. Thus, values are not normalised,
but smoothed (= coefficient beta is added to values).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ## Not run:
polmineR::use("polmineR")
speeches <- polmineR::as.speeches("GERMAPARLMINI", s_attribute_name = "speaker")
library(rJava)
.jinit()
.jaddClassPath("/opt/mallet-2.0.8/class") # after .jinit(), not before
.jaddClassPath("/opt/mallet-2.0.8/lib/mallet-deps.jar")
instance_list <- topicanalysis::mallet_make_instance_list(speeches)
instancefile <- mallet_instance_list_store(instance_list)
lda <- mallet::MalletLDA(num.topics = 20)
lda$loadDocuments(instance_list)
lda$setAlphaOptimization(20, 50)
lda$train(100)
# This is the call used internally by 'as_LDA()'. The difference
# is that the arguments of the $getTopicWords()-method are FALSE
# (argument 'normalized') and TRUE (argument 'smoothed')
beta_1 <- rJava::.jevalArray(lda$getTopicWords(FALSE, TRUE), simplify = TRUE)
alphabet <- strsplit(lda$getAlphabet()$toString(), "\n")[[1]]
colnames(beta_1) <- alphabet
beta_1 <- beta_1[, alphabet[order(alphabet)] ]
rownames(beta_1) <- as.character(1:nrow(beta_1))
# This is an approach that uses a (temporary) file written
# to disk. The advantage is that it is a sparse matrix that is
# passed
fname <- mallet_save_word_weights(lda)
word_weights <- mallet_load_word_weights(fname)
beta_2 <- t(as.matrix(word_weights))
# Demonstrate the equivalence of the two approaches
identical(rownames(beta_1), rownames(beta_2))
identical(colnames(beta_1), colnames(beta_2))
identical(apply(beta_1, 1, order), apply(beta_2, 1, order))
identical(beta_1, beta_2)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.