select_top_terms: Select top terms by count, trend metric and/or name pattern

Description Usage Arguments Value

View source: R/term-analysis.R

Description

select_top_terms allows to select a specified number of top terms based on miscellaneous properties of the term frequencies. This method is typically used to select term frequency time series for plotting and exploratory analysis. See the details of the function arguments for selection options.

Usage

1
2
select_top_terms(termFrequencies, topN = 25,
  selectBy = "most_frequent", selectTerms = NULL)

Arguments

termFrequencies

a dataframe of term frequencies as returned by term_frequencies()

topN

the number of returned top terms meeting the selection criteria in selectBy

selectBy

the selection approach which determines the metric by which terms will be sorted to select the topN terms. Currently, the following options are supported:

most_frequent

the default, select terms based on the total number of occurrences

trending_up

select terms with largest upwards trend; internally this is measured by the slope of a simple linear regression fit to a term's frequency series.

trending_down

select terms with largest downward trend; internally this is measured by the slope of a simple linear regression fit to a term's frequency series.

trending

select terms with either largest upward or downward trend; internally this is measured by the absolute value of the slope of a simple linear regression fit to a terms frequency series.

most_volatile

select terms with the largest change throughout the covered time period; internally this is measured by the residual standard deviation of the linear model fit to a term's time frequency series.

selectTerms

a character vector of term patterns, that terms are matched to for selection. regular expression syntax can be applied, e.g. if c("^mod", "an", "el$", "^outbreak$") is supplied for selectTerms, all terms that either start with 'mod' or contain 'an' or end with 'el' or the exact term 'outbreak' are matched. The arguments selectBy and selectTerms can be combined.

Value

a dataframe specifying trend metrics employed for selecting top terms, where:

term

a unique term

n_term_total

the total number of a term's occurrences in the dataset

slope

the slope coefficient of a linear model fit to this term's time frequency series

volatility

the residual standard deviation of a linear model fit to this term's time frequency series

trend

a categorisation of the term frequency trend


sdaume/topicsplorrr documentation built on Dec. 22, 2021, 11:11 p.m.