Description Usage Arguments Details Examples
Functionality to support the following workflow (see examples): (a) Turn
partition_bundle
-object into mallet instance list, (b) store the
resulting jobjRef
-object, (c) run mallet topic modelling and (d)
turn ParallelTopicModel Java object into LDA_Gibbs
object from
package topicmodels
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | mallet_make_instance_list(
x,
p_attribute = "word",
phrases = NULL,
terms_to_drop = tm::stopwords("de"),
mc = TRUE,
verbose = TRUE
)
as_LDA(x, verbose = TRUE, beta = NULL, gamma = NULL)
mallet_instance_list_store(x, filename = tempfile())
mallet_instance_list_load(filename)
mallet_load_topicmodel(filename)
|
x |
A |
p_attribute |
The p_attribute to use, typically "word" or "lemma". |
phrases |
A |
terms_to_drop |
stopwords |
mc |
A |
verbose |
A |
beta |
The beta matrix for a topic model. |
gamma |
The gamma matrix for a topic model. |
filename |
Where to store the Java-object. |
... |
further parameters |
The as_LDA()
-function will turn an estimated topic model
prepared using 'mallet' into a LDA_Gibbs
object as defined in the
topicmodels
package. This may be useful for using topic model
evaluation tools available for the LDA_Gibbs
class, but not for the
immediate output of malled topicmodelling. Note that the gamma matrix is
normalized and smoothed, the beta matrix is the logarithmized matrix of
normalized and smoothed values obtained from the input mallet topic model.
The function mallet_instance_list_load
will load a Java
InstanceList object that has been saved to disk (e.g. by using the
mallet_instance_list_store
function). The return value is a
jobjRef
object. Internally, the function reuses code of the function
load.mallet.instances
from the R package mallet
.
The function mallet_load_topicmodel
will load a topic model
created using mallet into memory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# Preparations: Create instance list
polmineR::use("polmineR")
speeches <- polmineR::as.speeches("GERMAPARLMINI", s_attribute_name = "speaker")
if (requireNamespace("rJava")){
# options(java.parameters = "-Xmx4g")
library(rJava)
.jinit()
# We need to put the jars from mallet 2.0 on the classpath because
# only the newer mallet version (not the one included in mallet R package)
# has method ParallelTopicModel$getDocumentTopics() needed by as_LDA
.jaddClassPath("/opt/mallet-2.0.8/class") # after .jinit()
.jaddClassPath("/opt/mallet-2.0.8/lib/mallet-deps.jar")
instance_list <- topicanalysis::mallet_make_instance_list(speeches)
instancefile <- mallet_instance_list_store(instance_list)
}
# Option 1: Run mallet from R using mallet package
if (requireNamespace("mallet")){
lda <- mallet::MalletLDA(num.topics = 20)
lda$loadDocuments(instance_list)
lda$setAlphaOptimization(20, 50)
lda$train(100)
}
# Option 2: Use ParallelTopicModel class - has write()-method
if (requireNamespace("mallet")){
destfile <- tempfile()
lda <- rJava::.jnew("cc/mallet/topics/ParallelTopicModel", 25L, 5.1, 0.1)
lda$addInstances(instance_list)
lda$setNumThreads(1L)
lda$setTopicDisplay(50L, 10L)
lda$setNumIterations(150L)
lda$estimate()
lda$write(rJava::.jnew("java/io/File", destfile))
}
# Load topicmodel and turn it into LDA_Gibbs
mallet_lda <- mallet_load_topicmodel(destfile)
## Not run:
topicmodels_lda <- as_LDA(mallet_lda)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.