sentiment_topics: Compute time series of topical sentiments

View source: R/timeSeries.R

sentiment_topicsR Documentation

Compute time series of topical sentiments

Description

Derive topical time series of sentiment from a LDA() or rJST() model. The time series are created by leveraging on estimated topic proportions and internal sentiment (for LDA models) or topical sentiment (for rJST models).

Usage

sentiment_topics(
  x,
  period = c("year", "quarter", "month", "day", "identity"),
  rolling_window = 1,
  scale = TRUE,
  scaling_period = c("1900-01-01", "2099-12-31"),
  plot = c(FALSE, TRUE, "silent"),
  plot_ridgelines = TRUE,
  as.xts = TRUE,
  ...
)

plot_sentiment_topics(
  x,
  period = c("year", "quarter", "month", "day"),
  rolling_window = 1,
  scale = TRUE,
  scaling_period = c("1900-01-01", "2099-12-31"),
  plot_ridgelines = TRUE,
  ...
)

Arguments

x

a LDA() or rJST() model populated with internal dates and/or internal sentiment.

period

the sampling period within which the sentiment of documents will be averaged. period = "identity" is a special case that will return document-level variables before the aggregation happens. Useful to rapidly compute topical sentiment at the document level.

rolling_window

if greater than 1, determines the rolling window to compute a moving average of sentiment. The rolling window is based on the period unit and rely on actual dates (i.e, is not affected by unequally spaced data points).

scale

if TRUE, the resulting time series will be scaled to a mean of zero and a standard deviation of 1. This argument also has the side effect of attaching scaled sentiment values as docvars to the input object with the ⁠_scaled⁠ suffix.

scaling_period

the date range over which the scaling should be applied. Particularly useful to normalize only the beginning of the time series.

plot

if TRUE, prints a plot of the time series and attaches it as an attribute to the returned object. If 'silent', do not print the plot but still attaches it as an attribute.

plot_ridgelines

if TRUE, time series are plotted as ridgelines. Requires ggridges package installed. If FALSE, the plot will use only standards ggplot2 functions. If the argument is missing and the package ggridges is not installed, this will quietly switch to a ggplot2 output.

as.xts

if TRUE, returns an xts::xts object. Otherwise, returns a data.frame.

...

other arguments passed on to zoo::rollapply() or mean() and sd().

Details

A topical sentiment is computed at the document level for each topic. For an LDA model, the sentiment of each topic is considered equal to the document sentiment (i.e. s_i = s \forall i \in K). For a rJST model, these result from the proportions in the sentiment layer under each topic. To compute the topical time series, the topical sentiment of all documents in a period are aggregated according to their respective topic proportion. For example, for a given topic, the topical sentiment in period t is computed using:

s_t = \frac{\sum_{d = 1}^D s_d \times \theta_d}{\sum_{d = 1}^D \theta_d}

, where s_d is the sentiment of the topic in document d and theta_d the topic proportion in a document d.

Value

an xts::xts or data.frame containing the time series of topical sentiments.

See Also

sentopics_sentiment sentopics_date

Other series functions: proportion_topics(), sentiment_breakdown(), sentiment_series()

Examples

lda <- LDA(ECB_press_conferences_tokens)
lda <- grow(lda, 100)
sentiment_topics(lda)

# plot shortcut
plot_sentiment_topics(lda, period = "month", rolling_window = 3)
# with or without ridgelines
plot_sentiment_topics(lda, period = "month", plot_ridgelines = FALSE)

# also available for rJST models with internal sentiment computation
rjst <- rJST(ECB_press_conferences_tokens, lexicon = LoughranMcDonald)
rjst <- grow(rjst, 100)
sentopics_sentiment(rjst, override = TRUE)
sentiment_topics(rjst)

sentopics documentation built on May 31, 2023, 8:26 p.m.