plot_topic_frequencies: Plot individual topic frequencies by date in a faceted plot

Description Usage Arguments Details See Also

View source: R/plot-topics.R

Description

plot_term_frequencies plots time series of topic shares for a selection of topics in a faceted plot. Each topic is displayed in a single subplot and each time series is overlayed with a linear trendline.

Usage

1
2
3
4
plot_topic_frequencies(topicsByDocDate, topicLabels = NULL,
  timeBinUnit = "week", topN = 25, minTopicTimeBins = 0.5,
  minGamma = 0.01, selectTopicsBy = "most_frequent",
  selectTopics = NULL, verboseLabels = FALSE, nCols = 5)

Arguments

topicsByDocDate

a dataframe as returned by topics_by_doc_date

topicLabels

a dataframe as returned by topics_terms_map, associating a topic_id with a suitable topic_label; if NULL (the default), suitable default labels will be generated.

timeBinUnit

a character sequence specifying the time period that should be used as a bin unit when computing topic share frequencies. Valid values are "day", "week", "month", "quarter", "year", "week" is the default. NOTE, for the assignment of weeks Monday is considered as the first day of the week.

topN

the number of top topics (according to the selection criteria in selectTopicsBy) that should be displayed

minTopicTimeBins

a double in the range [0,1] specifying the minimum share of all unique timebins in which an occurrence of a topic share of at least minGamma must have been recorded, i.e. a value of 0.5 (the default) requires that an occurrence of a topic must have been recorded in at least 50% of all unique timebins covered by the dataset; topics that do not meet this threshold will not be included in the returned results.

minGamma

the minimum share of a topic per document to be considered when summarizing topic frequencies; topics with smaller shares per individual document will be ignored when computing topic frequencies. The default is 0.01, but should be adjusted with view of the number of topics and the average length of a document. (In an stm topic model the likelihood that a topic is generated from a topic is expressed by the value gamma.)

selectTopicsBy

the selection approach which determines the metric by which topic_ids will be sorted to select the topN topics. Currently, the following options are supported:

most_frequent

the default, select topics based on the total number of documents in which the topic occurs (NOTE, that the document count depends on the minimum topic likelihood minGamma that was specified when obtaining the topic frequencies.)

trending_up

select topics with largest upwards trend; internally this is measured by the slope of a simple linear regression fit to a topic_id's frequency series.

trending_down

select topics with largest downward trend; internally this is measured by the slope of a simple linear regression fit to a topic_id's frequency series.

trending

select topics with either largest upward or downward trend; internally this is measured by the absolute value of the slope of a simple linear regression fit to a topic_ids frequency series.

most_volatile

select topics with the largest change throughout the covered time period; internally this is measured by the residual standard deviation of the linear model fit to a topic_id's time frequency series.

topic_id

select topics specified by topic_id in the function argument selectTopics.

selectTopics

a vector of topic IDs which should be plotted; this option is only considered when the option "topic_id" is chosen for selectBy.

verboseLabels

a Boolean indicating if additional topic information should be used to labels subplots. The default is FALSE.

nCols

the number of columns along which topic subplots should be layed out.

Details

This function merges the computation of topic frequencies (topic_frequencies), creation of suitable labels (topics_terms_map) and the selection of topics by miscellaneous criteria (select_top_topics) into one step.

See Also

Other visualizations: plot_term_frequencies


sdaume/topicsplorrr documentation built on Dec. 22, 2021, 11:11 p.m.