write_mallet_model: A convenience function for saving all the model outputs at...
In agoldst/dfrtopics: Tools for exploring topic models of text

write_mallet_model

R Documentation

A convenience function for saving all the model outputs at once.

Description

Save a series of files with the results of an LDA run. By default this will produce a number of files, including several large ones.

Usage

write_mallet_model(
  m,
  output_dir = ".",
  n_top_words = 50,
  save_instances = FALSE,
  save_scaled = FALSE,
  save_state = TRUE,
  simplify_state = TRUE
)

Arguments

`m`	`mallet_model` object
`output_dir`	where to save all the output files.
`save_instances`	if TRUE, extract the instance list from the trainer object and save it to `instances.mallet`
`save_scaled`	if TRUE write a file of 2D coordinates for the topics
`save_state`	if TRUE, save the MALLET sampling state in MALLET's format
`simplify_state`	if TRUE, save the sampling state in a simplified CSV format (requires python)

Details

The following files are written to output_dir:

topic_words.csv: unnormalized topic-word matrix, CSV format
vocabulary.txt: list of words (same order as columns of topic-word matrix), one per line
params.txt: Various model parameters, including hyperparameters
top_words.csv: topic key words CSV; see top_words for the format
doc_topics.csv: document-topic matrix CSV
mallet_state.gz: MALLET sampling state (a big file)
state.csv: simplified version of the sampling state
diagnostics.xml: MALLET model diagnostics
doc_ids.txt: instance id's, one per line
instances.mallet: save the source text "instances" file (not done by default)
topic_scaled.csv: CSV with scaled 2D coordinates for the topics. Obtained by applying cmdscale to a matrix of topic divergences calculated by topic_divergences