tdm_topic: The term-document matrix for a topic

tdm_topicR Documentation

The term-document matrix for a topic

Description

Extracts a matrix of counts of words assigned to a given topic in each document from the model's final Gibbs sampling state.

Usage

tdm_topic(m, topic)

Arguments

m

a mallet_model object with the sampling state loaded read_sampling_state. Operated on using mwhich.

topic

topic (indexed from 1) to find the term-document weights for

Details

This is useful for studying a topic conditional on some metadata covariate: it is important to realize that frequent words in the overall topic distribution may not be the same as very frequent words in that distribution over some sub-group of documents, particularly if the corpus contains widely varying language use. If, for example, the corpus stretches over a long time period, consider comparing the early and late parts of each of the within-topic term-document matrices.

Value

a sparseMatrix of within-topic word weights (unsmoothed and unnormalized) with words in rows and documents in columns (same ordering as vocabulary(m) and doc_ids(m))

See Also

read_sampling_state, mallet_model, load_sampling_state, top_n_row, sum_col_groups


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.