In SentometricsResearch/sentometrics: An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction

knitr::opts_chunk$set(warning = FALSE, message = FALSE, fig.width = 7, fig.height = 4, fig.align = "center")

This tutorial provides a guide on how to flexibly aggregate a qualitative corpus into many time series indices.

Preparation

library("sentometrics")
library("ggplot2")
library("data.table")

data("usnews")
data("list_lexicons")
data("list_valence_shifters")

Document-level sentiment aggregation into time series

To aggregate document-level sentiment scores into time series requires only to specify a few parameters regarding the weighting and the time frequency.

corpus <- sento_corpus(usnews)

s <- compute_sentiment(
  corpus,
  sento_lexicons(list_lexicons[c("GI_en", "LM_en")]),
  how = "counts"
)

ctr <- ctr_agg(howDocs = "proportional", howTime = "equal_weight", by = "month", lag = 6)
measures <- aggregate(s, ctr)

The obtained measures can be plotted, all of them, or according to the three time series dimensions (the corpus features, the lexicons and the time weighting schemes).

plot(measures)
plot(measures, "features")
plot(measures, "lexicons")
plot(measures, "time")

Document-level sentiment aggregation into time series (bis)

The aggregation into sentiment time series does not have to be done in two steps. Below one-step approach is recommended because very easy!

corpus <- sento_corpus(usnews)
lexicons <- sento_lexicons(list_lexicons[c("GI_en", "LM_en", "HENRY_en")])

ctr <- ctr_agg(howWithin = "counts",
               howDocs = "proportional",
               howTime = "linear", by = "week", lag = 14)
measures <- sento_measures(corpus, lexicons, ctr)

measures

Similar to extracting the peak documents in a sentiment table, we extract here the peak dates in the sentiment time series matrix. Detected below are the 5 dates where average sentiment across all sentiment measures is lowest.

peakDatesNeg <- peakdates(measures, n = 5, type = "neg", do.average = TRUE)

dtPeaks <- as.data.table(subset(measures, date %in% peakDatesNeg))

data.table(dtPeaks[, "date"], avg = rowMeans(dtPeaks[, -1]))

Sentiment aggregation into time series

Sentiment measures can be further aggregated across any of the three dimensions. The computed time series are averages across the relevant measures.

corpus <- sento_corpus(usnews)
lexicons <- sento_lexicons(list_lexicons[c("GI_en", "LM_en", "HENRY_en")])
ctr <- ctr_agg(howWithin = "counts",
               howDocs = "proportional",
               howTime = c("linear", "equal_weight"), by = "week", lag = 14)

measures <- sento_measures(corpus, lexicons, ctr)

measuresAgg <- aggregate(measures,
                         features = list("journal" = c("wsj", "wapo")),
                         time = list("linequal" = c("linear", "equal_weight")))

get_dimensions(measuresAgg) # inspect the components of the three dimensions

With the subset() function, a sento_measures object can be subsetted based on a condition, and time series with certain dimension components can be selected or deleted.

m1 <- subset(measuresAgg, 1:100) # first 100 weeks
m2 <- subset(measuresAgg, date %in% tail(get_dates(measuresAgg), 100)) # last 100 weeks
m3 <- subset(measuresAgg, select = "journal")
m4 <- subset(measuresAgg, delete = list(c("economy", "GI_en"), c("noneconomy", "HENRY_en")))

Sentiment aggregation into time series (bis)

To keep it at its simplest, all the sentiment measures computed can be condensed in a few global sentiment time series. The computation can be run in a weighted way as well.

corpus <- sento_corpus(usnews)
lexicons <- sento_lexicons(list_lexicons[c("GI_en", "LM_en", "HENRY_en")])
ctr <- ctr_agg(howWithin = "counts",
               howDocs = "proportional",
               howTime = c("linear", "equal_weight"), by = "week", lag = 14)

measures <- sento_measures(corpus, lexicons, ctr)

measuresGlobal <- aggregate(measures, do.global = TRUE)

measuresGlobal

The output in this case is not a specific sentometrics sento_measures object, but a data.table object. Below produces a nice plot using the ggplot2 package.

ggplot(melt(measuresGlobal, id.vars = "date")) +
  aes(x = date, y = value, color = variable) +
  geom_line() +
  scale_x_date(name = "Date", date_labels = "%m-%Y") +
  scale_y_continuous(name = "Sentiment") +
  theme_bw() + 
  sentometrics:::plot_theme(legendPos = "top") # a trick to make the plot look nicer

SentometricsResearch/sentometrics documentation built on April 3, 2025, 3:16 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SentometricsResearch/sentometrics
An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction

In SentometricsResearch/sentometrics: An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction

Document-level sentiment aggregation into time series

Document-level sentiment aggregation into time series (bis)

Sentiment aggregation into time series

Sentiment aggregation into time series (bis)

R Package Documentation

Browse R Packages

We want your feedback!

SentometricsResearch/sentometrics An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction

In SentometricsResearch/sentometrics: An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction

Document-level sentiment aggregation into time series

Document-level sentiment aggregation into time series (bis)

Sentiment aggregation into time series

Sentiment aggregation into time series (bis)

R Package Documentation

Browse R Packages

We want your feedback!

SentometricsResearch/sentometrics
An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction