sentiment_topics: Compute time series of topical sentiments
In sentopics: Tools for Joint Sentiment and Topic Analysis of Textual Data

sentiment_topics

R Documentation

Compute time series of topical sentiments

Description

Derive topical time series of sentiment from a LDA() or rJST() model. The time series are created by leveraging on estimated topic proportions and internal sentiment (for LDA models) or topical sentiment (for rJST models).

Usage

sentiment_topics(
  x,
  period = c("year", "quarter", "month", "day", "identity"),
  rolling_window = 1,
  scale = TRUE,
  scaling_period = c("1900-01-01", "2099-12-31"),
  plot = c(FALSE, TRUE, "silent"),
  plot_ridgelines = TRUE,
  as.xts = TRUE,
  ...
)

plot_sentiment_topics(
  x,
  period = c("year", "quarter", "month", "day"),
  rolling_window = 1,
  scale = TRUE,
  scaling_period = c("1900-01-01", "2099-12-31"),
  plot_ridgelines = TRUE,
  ...
)

Arguments

`x`	a `LDA()` or `rJST()` model populated with internal dates and/or internal sentiment.
`period`	the sampling period within which the sentiment of documents will be averaged. `period = "identity"` is a special case that will return document-level variables before the aggregation happens. Useful to rapidly compute topical sentiment at the document level.
`rolling_window`	if greater than 1, determines the rolling window to compute a moving average of sentiment. The rolling window is based on the period unit and rely on actual dates (i.e, is not affected by unequally spaced data points).
`scale`	if `TRUE`, the resulting time series will be scaled to a mean of zero and a standard deviation of 1. This argument also has the side effect of attaching scaled sentiment values as docvars to the input object with the `⁠_scaled⁠` suffix.
`scaling_period`	the date range over which the scaling should be applied. Particularly useful to normalize only the beginning of the time series.
`plot`	if `TRUE`, prints a plot of the time series and attaches it as an attribute to the returned object. If `'silent'`, do not print the plot but still attaches it as an attribute.
`plot_ridgelines`	if `TRUE`, time series are plotted as ridgelines. Requires `ggridges` package installed. If `FALSE`, the plot will use only standards `ggplot2` functions. If the argument is missing and the package `ggridges` is not installed, this will quietly switch to a `ggplot2` output.
`as.xts`	if `TRUE`, returns an xts::xts object. Otherwise, returns a data.frame.
`...`	other arguments passed on to `zoo::rollapply()` or `mean()` and `sd()`.

Details

A topical sentiment is computed at the document level for each topic. For an LDA model, the sentiment of each topic is considered equal to the document sentiment (i.e. s_i = s \forall i \in K). For a rJST model, these result from the proportions in the sentiment layer under each topic. To compute the topical time series, the topical sentiment of all documents in a period are aggregated according to their respective topic proportion. For example, for a given topic, the topical sentiment in period t is computed using:

s_t = \frac{\sum_{d = 1}^D s_d \times \theta_d}{\sum_{d = 1}^D \theta_d}

, where s_d is the sentiment of the topic in document d and theta_d the topic proportion in a document d.

Value

an xts::xts or data.frame containing the time series of topical sentiments.

Examples

lda <- LDA(ECB_press_conferences_tokens)
lda <- fit(lda, 100)
sentiment_topics(lda)

# plot shortcut
plot_sentiment_topics(lda, period = "month", rolling_window = 3)
# with or without ridgelines
plot_sentiment_topics(lda, period = "month", plot_ridgelines = FALSE)

# also available for rJST models with internal sentiment computation
rjst <- rJST(ECB_press_conferences_tokens, lexicon = LoughranMcDonald)
rjst <- fit(rjst, 100)
sentopics_sentiment(rjst, override = TRUE)
sentiment_topics(rjst)

sentopics documentation built on Sept. 20, 2024, 5:06 p.m.