instancefile <- tempfile() # Use findable path modelfile <- tempfile() # Use findable path
The memory available for the Java Virtual Machine (JVM) to be used needs to be defined before creating the JVM
options(java.parameters = "-Xmx4g") # Sufficient for larger data
library("polmineR") library(topicanalysis) library(mallet) # Includes mallet jars library(rJava) library(data.table)
use("polmineR") coi <- "GERMAPARLMINI" speeches <- as.speeches(coi, s_attribute_name = "speaker")
Keep only documents with a minimum length.
doc_min_length <- 100L dt <- as.data.table(summary(speeches)) speeches <- speeches[[ dt[size >= doc_min_length][["name"]] ]]
instance_list <- mallet_make_instance_list(speeches)
Implicitly, the mallet_make_instance_list
uses stopwords of the tm
package.
Starting to estimate the topic model at: r (format(started <- Sys.time(), format = "%T"))
lda <- .jnew("cc/mallet/topics/ParallelTopicModel", 25L, 5.1, 0.1) lda$addInstances(instance_list) lda$setNumThreads(1L) lda$setTopicDisplay(50L, 10L) lda$setNumIterations(2000L) lda$estimate()
Finished computation at r (format(finished <- Sys.time(), "%T"))
(total time: r format(Sys.time() - started, format = "%T")
).
lda$write(rJava::.jnew("java/io/File", modelfile))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.