load_from_mallet_state: Load model from MALLET state output

load_from_mallet_stateR Documentation

Load model from MALLET state output

Description

If you have created a topic model using command-line mallet or another tool, this function loads that model into mallet_model form suitable for use in this package. It uses the gzipped text file representing the Gibbs sampling state. This state can be used to derive document-topic and topic-word matrices. The model vocabulary and document ID list are obtained from the MALLET instances file.

Usage

load_from_mallet_state(
  mallet_state_file,
  simplified_state_file = file.path(dirname(mallet_state_file), "state.csv"),
  instances_file = NULL,
  keep_sampling_state = TRUE,
  metadata_file = NULL,
  bigmemory = TRUE
)

Arguments

mallet_state_file

name of gzipped state file

simplified_state_file

name of file to save "simplified" representation of the state to (simplify_state, q.v.). If NULL, a temporary file will be used.

instances_file

location of MALLET instances file used to create the model. If NULL, this will be skipped, but the resulting model object will have missing vocabulary and document ID's.

keep_sampling_state

If TRUE (default), the returned object will hold a reference to the sampling state big.matrix as well.

metadata_file

metadata file (CSV or TSV; optional here)

bigmemory

If TRUE (default), the bigmemory and bigtabulate packages will be used to read and store the Gibbs sampling state. If for some reason this does not work, try bigmemory=F, but note that this will be more memory-intensive, and the result will not hold the sampling state (sampling_state will be NULL).

Value

a mallet_model object.

See Also

load_mallet_model, load_mallet_model_directory, write_mallet_state

Examples

## Not run: 
system("mallet train-topics --input instances.mallet \\
    --output-state topic-state.gz")
m <- load_from_mallet_state("topic-state.gz", "state.csv",
    "instances.mallet")

## End(Not run)


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.