paralleltopicmodel: Instantiate and load mallet topicmodel

BigTopicModelR Documentation

Instantiate and load mallet topicmodel

Description

Instantiate and load mallet topicmodel

Usage

BigTopicModel(
  instances = NULL,
  n_topics = 25L,
  alpha_sum = 5.1,
  beta = 0.1,
  threads = 1L,
  iterations = 1000L,
  verbose = TRUE,
  silent = FALSE
)

ParallelTopicModel(n_topics = 25L, alpha_sum = 5.1, beta = 0.1)

mallet_load_topicmodel(binfile, instancefile, statefile, verbose = TRUE)

Arguments

instances

A Mallet 'InstanceList' object.

n_topics

Number of topics (single 'integer' value).

alpha_sum

Passed into constructor.

beta

Passet into constructor.

threads

Number of threads/cores to use.

iterations

Number of interations to run.

verbose

A 'logical' value, whether to output progress messages.

silent

Defaults to 'FALSE', if 'TRUE', all Mallet progress messages are muted.

binfile

Either a 'character' vector containing the path of a mallet topic model (ParallelTopicModel), tilde expansion will be appied. Or a Java file object.

instancefile

Path to a serialized instance list (binary data format).

statefile

Path to a statefile (gzipped text file usually ending with .gz).

Details

The 'BigTopicModel' function will instantiate a Java class object 'BigTopicModel' which inherits from the 'RTopicModel' and the 'ParallelTopicModel' class. It adds a method '$getDocLengthCounts()' to the the classes it inherits from to provide a fast access to document lengths.

The 'ParallelTopicModel()' function will instantial a Java class object with the same name from the mallet package, see the mallet documentation of the class.

The function 'mallet_load_topicmodel()' will load a topic model created using mallet into a 'BigTopicModel' object.

Examples

fname <- system.file(package = "biglda", "extdata", "mallet", "lda_mallet.bin")
bigmodel <- mallet_load_topicmodel(fname)
bigmodel$getDocLengthCounts()
pta <- ParallelTopicModel()
destfile <- tempfile()
pta$write(rJava::.jnew("java/io/File", destfile))
pta_reloaded <- mallet_load_topicmodel(destfile)

# Restore model from binfile
data_dir <- system.file(package = "biglda", "extdata", "mallet")
binfile <- file.path(data_dir, "lda_mallet.bin")
model <- mallet_load_topicmodel(binfile)

# Restore model from instance- and statefile
model <- mallet_load_topicmodel(
  instancefile = file.path(data_dir, "instance_list.mallet"),
  statefile = file.path(data_dir, "lda_mallet.gz")
)

PolMine/biglda documentation built on Feb. 25, 2023, 11:24 p.m.