plot_term_frequencies: Plot individual term frequencies by date in a faceted plot

Description Usage Arguments Details See Also

View source: R/plot-topics.R

Description

plot_term_frequencies plots time series of term shares for a selection of terms in a faceted plot. Each term is displayed in a single subplot and each time series is overlayed with a linear trendline.

Usage

1
2
3
4
plot_term_frequencies(termsByDate, timeBinUnit = "week",
  minTermTimeBins = 0.5, minTermOccurences = 10, topN = 25,
  selectTerms = NULL, selectTermsBy = "most_frequent",
  verboseLabels = FALSE, nCols = 5)

Arguments

termsByDate

a dataframe as returned by terms_by_date

timeBinUnit

a character sequence specifying the time period that should be used as a bin unit when computing term frequencies. Valid values are "day", "week", "month", "quarter", "year", but for the text sources processed in this package "week" is recommended and used as a default. NOTE: for the assignment of weeks Monday is considered as the first day of the week.

minTermTimeBins

a double in the range [0,1] specifying the minimum share of all unique timebins in which an occurrence of a term must have been recorded, i.e. a value of 0.5 (the default) requires that an occurrence of a term must have been recorded in at least 50% of all unique timebins covered by the dataset; terms that do not meet this threshold will not be included in the returned results.

minTermOccurences

an integer specifying the minimum of total occurrences of a term to be included in the results; terms that do not meet this threshold will not be included in the returned results.

topN

the number of displayed top terms meeting the selection criteria in selectBy

selectTerms

a character vector of term patterns, that terms are matched to for selection. regular expression syntax can be applied, e.g. if c("^mod", "an", "el$", "^outbreak$") is supplied for selectTerms, all terms that either start with 'mod' or contain 'an' or end with 'el' or the exact term 'outbreak' are matched. The arguments selectBy and selectTerms can be combined.

selectTermsBy

the selection approach which determines the metric by which terms will be sorted to select the topN terms. Currently, the following options are supported:

most_frequent

the default, select terms based on the total number of occurrences

trending_up

select terms with largest upwards trend; internally this is measured by the slope of a simple linear regression fit to a term's frequency series.

trending_down

select terms with largest downward trend; internally this is measured by the slope of a simple linear regression fit to a term's frequency series.

trending

select terms with either largest upward or downward trend; internally this is measured by the absolute value of the slope of a simple linear regression fit to a terms frequency series.

most_volatile

select terms with the largest change throughout the covered time period; internally this is measured by the residual standard deviation of the linear model fit to a term's time frequency series.

verboseLabels

a Boolean indicating if a single terms should be used as labels for subplots or if multiple term instances (i.e. variations of the term found in the original text source) should be used as labels. The default is FALSE.

nCols

the number of columns along which term subplots should be layed out.

Details

This function merges the computation of term frequencies (term_frequencies), creation of suitable labels (terms_tokens_map) and the selection of terms by msicalleneous criteria (select_top_terms) into one step.

See Also

Other visualizations: plot_topic_frequencies


sdaume/topicsplorrr documentation built on Dec. 22, 2021, 11:11 p.m.