read_mallet_state: Load MALLET sampling state from disk

Description Usage Arguments Details Value See Also

Description

A helper function to read in a MALLET sampling state.

Usage

1
read_mallet_state(filename, doc_ids = NULL)

Arguments

filename

name of gzip file holding the sampling state

doc_ids

character vector of document IDs. If supplied, the doc column of the resulting dataframe will be populated with these values. If it is not supplied, then the doc column will be document numbers (from 1, not 0 as in the state file).

Details

Does not require the mallet package. As long as the supplied file is of the expected format (e.g. from command-line MALLET), this will do the job. Well, if you have the RAM.

To get a list of stored document IDs from a model object model, use model$getDocumentNames().

Value

a data frame with three columns, doc, word, and topic. doc is either a document index or an ID if doc_ids is supplied; word is the token as a string; and topic is the topic number (counting from 1, not 0)

See Also

write_mallet_state


agoldst/litdata documentation built on May 10, 2019, 7:34 a.m.