View source: R/sampling_state.R
read_sampling_state | R Documentation |
This function reads in a Gibbs sampling state represented by
document,word,topic,count
rows to a
big.matrix
. This gives the model's assignments of
words to topics within documents. MALLET itself remembers token order, but
in ordinary LDA the words are assumed exchangeable within documents. The
recommended interface to this sampling state is
load_sampling_state
, which calls this function.
read_sampling_state(filename, data_type = "integer", big_workdir = tempdir())
filename |
the name of a CSV file holding the simplified state: a CSV
with header row and four columns, |
data_type |
the C++ type to store the data in. If all values have
magnitude less than 2^15, you can get away with |
big_workdir |
the working directory where
|
N.B. The MALLET sampling state, and the "simplified state" output by this function to disk, index documents, words, and topics from zero, but the dataframe returned by this function indexes these from one, for convenience within R.
a big.matrix
with four columns,
document,word,topic,count
. Documents, words, and topics are
one-indexed in the result, so these values may be used as indices to
the vectors returned by doc_ids
, vocabulary
,
doc_topics
, etc.
load_mallet_state
, write_mallet_state
,
tdm_topic
, simplify_state
, and package
bigmemory.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.