mallet: Interface to mallet topicmodelling.

Description Usage Arguments Details Examples

Description

Functionality to support the following workflow (see examples): (a) Turn partition_bundle-object into mallet instance list, (b) store the resulting jobjRef-object, (c) run mallet topic modelling and (d) turn ParallelTopicModel Java object into LDA_Gibbs object from package topicmodels.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mallet_make_instance_list(
  x,
  p_attribute = "word",
  phrases = NULL,
  terms_to_drop = tm::stopwords("de"),
  mc = TRUE,
  verbose = TRUE
)

as_LDA(x, verbose = TRUE, beta = NULL, gamma = NULL)

mallet_instance_list_store(x, filename = tempfile())

mallet_instance_list_load(filename)

mallet_load_topicmodel(filename)

Arguments

x

A partition_bundle object.

p_attribute

The p_attribute to use, typically "word" or "lemma".

phrases

A phrases object (S4 class from polmineR) that will be used to concatenate phrases.

terms_to_drop

stopwords

mc

A logical value, whether to use multicore.

verbose

A logical value, whether to be verbose.

beta

The beta matrix for a topic model.

gamma

The gamma matrix for a topic model.

filename

Where to store the Java-object.

...

further parameters

Details

The as_LDA()-function will turn an estimated topic model prepared using 'mallet' into a LDA_Gibbs object as defined in the topicmodels package. This may be useful for using topic model evaluation tools available for the LDA_Gibbs class, but not for the immediate output of malled topicmodelling. Note that the gamma matrix is normalized and smoothed, the beta matrix is the logarithmized matrix of normalized and smoothed values obtained from the input mallet topic model.

The function mallet_instance_list_load will load a Java InstanceList object that has been saved to disk (e.g. by using the mallet_instance_list_store function). The return value is a jobjRef object. Internally, the function reuses code of the function load.mallet.instances from the R package mallet.

The function mallet_load_topicmodel will load a topic model created using mallet into memory.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
 
# Preparations: Create instance list

polmineR::use("polmineR")
speeches <- polmineR::as.speeches("GERMAPARLMINI", s_attribute_name = "speaker")

if (requireNamespace("rJava")){
  # options(java.parameters = "-Xmx4g")
  library(rJava)
  .jinit()
  # We need to put the jars from mallet 2.0 on the classpath because
  # only the newer mallet version (not the one included in mallet R package)
  # has method ParallelTopicModel$getDocumentTopics() needed by as_LDA
  .jaddClassPath("/opt/mallet-2.0.8/class") # after .jinit()
  .jaddClassPath("/opt/mallet-2.0.8/lib/mallet-deps.jar")
  instance_list <- topicanalysis::mallet_make_instance_list(speeches)
  instancefile <- mallet_instance_list_store(instance_list)
}

# Option 1: Run mallet from R using mallet package

if (requireNamespace("mallet")){
  lda <- mallet::MalletLDA(num.topics = 20)
  lda$loadDocuments(instance_list)
  lda$setAlphaOptimization(20, 50)
  lda$train(100)
}

# Option 2: Use ParallelTopicModel class - has write()-method

if (requireNamespace("mallet")){
  destfile <- tempfile()
  lda <- rJava::.jnew("cc/mallet/topics/ParallelTopicModel", 25L, 5.1, 0.1)
  lda$addInstances(instance_list)
  lda$setNumThreads(1L)
  lda$setTopicDisplay(50L, 10L)
  lda$setNumIterations(150L)
  lda$estimate()
  lda$write(rJava::.jnew("java/io/File", destfile))
}

# Load topicmodel and turn it into LDA_Gibbs

mallet_lda <- mallet_load_topicmodel(destfile)
## Not run: 
topicmodels_lda <- as_LDA(mallet_lda)

## End(Not run)

PolMine/polmineR.topics documentation built on March 6, 2020, 6:03 p.m.