infer_topics: Infer document topics
In agoldst/dfrtopics: Tools for exploring topic models of text

infer_topics

R Documentation

Infer document topics

Description

Given an already-trained topic model, infer topic proportions for new documents. This is like the Gibbs sampling process for making a topic model, but the topic-word proportions are not updated.

Usage

infer_topics(m, instances, ...)

## S3 method for class 'mallet_model_inferred'
print(x)

## S3 method for class 'mallet_model_inferred'
summary(x)

## S3 method for class 'mallet_model_inferred'
docs_top_topics(m, n)

## S3 method for class 'mallet_model_inferred'
top_docs(m, n)

Arguments

`m`	either a topic inferencer object from `read_inferencer` or `inferencer` or a `mallet_model` object
`instances`	an InstanceList object. It must be compatible i.e., (its vocabulary must correspond) with the instances on which `inferencer` was trained. Use `compatible_instances` to generate this.
`n_iterations`	number of Gibbs sampling iterations
`sampling_interval`	thinning interval
`burn_in`	number of burn-in iterations
`seed`	integer random seed; set for reproducibility

Value

a model object of class mallet_model_inferred, which inherits from mallet_model. This does not have all the elements of the original topic model, however; the new value of interest is the matrix of estimated document-topic weights, accessible via doc_topics. The inferencer sampling state and hyperparameters are not accessible. MALLET supplies estimated topic proportions, which we multiply by the document lengths to obtain the doc-topics matrix.

Examples

## Not run: 
# beginning with a model m and new documents docs:
inferred_m <- make_instances(docs) %>%
    infer_topics(m, .)

# extract new doc-topic matrix
doc_topics(inferred_m)
# or a convenient data frame of high-ranking topics in each doc
docs_top_topics(inferred_m, n=3)
# or, similarly, but for high-ranking documents in each topic
top_docs(inferred_m, n=3)

## End(Not run)

agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.