select_top_topics: Select top topics by document counts or temporal trend metric

Description Usage Arguments Value

View source: R/topic-analysis.R

Description

select_top_topics allows to select a specified number of top topics based on miscellaneous properties of the topic frequencies. This method is typically used to select a topic frequency time series for plotting and exploratory analysis. See the details of the function arguments for selection options.

Usage

1
2
select_top_topics(topicFrequencies, topN = 25,
  selectBy = "most_frequent", selectTopics = NULL)

Arguments

topicFrequencies

a dataframe of topic frequencies as returned by topic_frequencies

topN

the number of returned top topics meeting the selection criteria in selectBy

selectBy

the selection approach which determines the metric by which topic_ids will be sorted to select the topN terms. Currently, the following options are supported:

most_frequent

the default, select terms based on the total number of documents in which the topic occurs (NOTE, that the document count depends on the minimum topic likelihood minGamma that was specified when obtaining the topic frequencies.)

trending_up

select topics with largest upwards trend; internally this is measured by the slope of a simple linear regression fit to a topic_id's frequency series.

trending_down

select topics with largest downward trend; internally this is measured by the slope of a simple linear regression fit to a topic_id's frequency series.

trending

select topics with either largest upward or downward trend; internally this is measured by the absolute value of the slope of a simple linear regression fit to a topic_ids frequency series.

most_volatile

select topics with the largest change throughout the covered time period; internally this is measured by the residual standard deviation of the linear model fit to a topic_id's time frequency series.

topic_id

select topics specified by topic_id in the function argument selectTopics.

selectTopics

a vector of topic IDs by which the returned results should be filtered; this option is only considered when the option "topic_id" is chosen for selectBy.

Value

a dataframe specifying topic metrics employed for selecting top topics, where:

topic_id

a unique topic identifier

n_doc_topics

the total number of documents in a dataset in which a topic with topic_id occurs

slope

the slope coefficient of a linear model fit to this topic_id's time frequency series

volatility

the residual standard deviation of a linear model fit to this topic_id's time frequency series

trend

a categorisation of the topic frequency trend


sdaume/topicsplorrr documentation built on Dec. 22, 2021, 11:11 p.m.