docs_top_topics: Top-ranked topics for documents

docs_top_topicsR Documentation

Top-ranked topics for documents

Description

This function extracts the most salient topics for all documents from the document-topic matrix.

Usage

docs_top_topics(m, n, ...)

Arguments

m

mallet_model object

n

number of top topics to extract

weighting

a function to transform the document-topic matrix. By default, the topic proportions are used (same rank as raw weights)

Details

Here as elsewhere "saliency" can be variously defined: though the easiest choice is to choose the topic which captures the largest proportion of a document, and that is the default, we might want to penalize topics which are widespread across the whole corpus. TODO: actually implement the alternative weighting.

Value

a data frame with three columns, doc, the numerical index of the document in doc_ids(m), topic, and weight, the weight used in ranking (topic proportion, by default)

a dataframe with n rows and two columns, topic and weight.

See Also

doc_topics


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.