write_mallet_model: A convenience function for saving all the model outputs at...

View source: R/model.R

write_mallet_modelR Documentation

A convenience function for saving all the model outputs at once.

Description

Save a series of files with the results of an LDA run. By default this will produce a number of files, including several large ones.

Usage

write_mallet_model(
  m,
  output_dir = ".",
  n_top_words = 50,
  save_instances = FALSE,
  save_scaled = FALSE,
  save_state = TRUE,
  simplify_state = TRUE
)

Arguments

m

mallet_model object

output_dir

where to save all the output files.

save_instances

if TRUE, extract the instance list from the trainer object and save it to instances.mallet

save_scaled

if TRUE write a file of 2D coordinates for the topics

save_state

if TRUE, save the MALLET sampling state in MALLET's format

simplify_state

if TRUE, save the sampling state in a simplified CSV format (requires python)

Details

The following files are written to output_dir:

topic_words.csv

unnormalized topic-word matrix, CSV format

vocabulary.txt

list of words (same order as columns of topic-word matrix), one per line

params.txt

Various model parameters, including hyperparameters

top_words.csv

topic key words CSV; see top_words for the format

doc_topics.csv

document-topic matrix CSV

mallet_state.gz

MALLET sampling state (a big file)

state.csv

simplified version of the sampling state

diagnostics.xml

MALLET model diagnostics

doc_ids.txt

instance id's, one per line

instances.mallet

save the source text "instances" file (not done by default)

topic_scaled.csv

CSV with scaled 2D coordinates for the topics. Obtained by applying cmdscale to a matrix of topic divergences calculated by topic_divergences


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.