infer_topics: Infer document topics

infer_topicsR Documentation

Infer document topics

Description

Given an already-trained topic model, infer topic proportions for new documents. This is like the Gibbs sampling process for making a topic model, but the topic-word proportions are not updated.

Usage

infer_topics(m, instances, ...)

## S3 method for class 'mallet_model_inferred'
print(x)

## S3 method for class 'mallet_model_inferred'
summary(x)

## S3 method for class 'mallet_model_inferred'
docs_top_topics(m, n)

## S3 method for class 'mallet_model_inferred'
top_docs(m, n)

Arguments

m

either a topic inferencer object from read_inferencer or inferencer or a mallet_model object

instances

an InstanceList object. It must be compatible i.e., (its vocabulary must correspond) with the instances on which inferencer was trained. Use compatible_instances to generate this.

n_iterations

number of Gibbs sampling iterations

sampling_interval

thinning interval

burn_in

number of burn-in iterations

seed

integer random seed; set for reproducibility

Value

a model object of class mallet_model_inferred, which inherits from mallet_model. This does not have all the elements of the original topic model, however; the new value of interest is the matrix of estimated document-topic weights, accessible via doc_topics. The inferencer sampling state and hyperparameters are not accessible. MALLET supplies estimated topic proportions, which we multiply by the document lengths to obtain the doc-topics matrix.

Examples

## Not run: 
# beginning with a model m and new documents docs:
inferred_m <- make_instances(docs) %>%
    infer_topics(m, .)

# extract new doc-topic matrix
doc_topics(inferred_m)
# or a convenient data frame of high-ranking topics in each doc
docs_top_topics(inferred_m, n=3)
# or, similarly, but for high-ranking documents in each topic
top_docs(inferred_m, n=3)

## End(Not run)


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.