load_from_mallet_state: Load model from MALLET state output
In agoldst/dfrtopics: Tools for exploring topic models of text

load_from_mallet_state

R Documentation

Load model from MALLET state output

Description

If you have created a topic model using command-line mallet or another tool, this function loads that model into mallet_model form suitable for use in this package. It uses the gzipped text file representing the Gibbs sampling state. This state can be used to derive document-topic and topic-word matrices. The model vocabulary and document ID list are obtained from the MALLET instances file.

Usage

load_from_mallet_state(
  mallet_state_file,
  simplified_state_file = file.path(dirname(mallet_state_file), "state.csv"),
  instances_file = NULL,
  keep_sampling_state = TRUE,
  metadata_file = NULL,
  bigmemory = TRUE
)

Arguments

`mallet_state_file`	name of gzipped state file
`simplified_state_file`	name of file to save "simplified" representation of the state to (`simplify_state`, q.v.). If NULL, a temporary file will be used.
`instances_file`	location of MALLET instances file used to create the model. If NULL, this will be skipped, but the resulting model object will have missing vocabulary and document ID's.
`keep_sampling_state`	If TRUE (default), the returned object will hold a reference to the sampling state `big.matrix` as well.
`metadata_file`	metadata file (CSV or TSV; optional here)
`bigmemory`	If TRUE (default), the bigmemory and bigtabulate packages will be used to read and store the Gibbs sampling state. If for some reason this does not work, try `bigmemory=F`, but note that this will be more memory-intensive, and the result will not hold the sampling state (`sampling_state` will be NULL).

Value

a mallet_model object.

Examples

## Not run: 
system("mallet train-topics --input instances.mallet \\
    --output-state topic-state.gz")
m <- load_from_mallet_state("topic-state.gz", "state.csv",
    "instances.mallet")

## End(Not run)

agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.