knitr::opts_chunk$set(echo = FALSE)

Executive summary

We performed text sentiment analysis on the ECB press conference speeches. By using four different sentiment dictionaries, we constructed four different sentiment score indicators. The scoring is done based on unigrams (individual words), with adjustment for negation words through bigrams (two adjacent words paired together). The sentiment indicator is then built up from the summation of the scores and its change measured across time.

library(dplyr)
library(tidytext)
library(lubridate)
library(ggplot2)
library(tidyverse)
library(GGally)
library(gridExtra)
source("text_analytics.R")  # Stores many of the functions used below
df <- read.csv("data/ecb_speeches.csv", stringsAsFactors = FALSE) %>%
  mutate(date = ymd(date), type = as.factor(type), speaker = as.factor(speaker))

# Extract press speech and answers
df_ecb <- df %>%
  filter(type %in% c("speech", "answer")) %>%
  group_by(date) %>%
  summarise(text = paste0(text, collapse = "."))

#Remove years as stop words
custom_stop_words = bind_rows(data_frame(word = c(as.character(1980:2030)),
                                          lexicon = c("custom")),
                               stop_words)

ecb_unigrams <- df_ecb %>%
  make_ngrams(custom_stop_words = custom_stop_words) 

Methodology

Preprocess: Tokenising words and attaching sentiment

We started by extracting ECB press conference transcripts since 2000, then converting transcripts into a list of individual words, in a process known as tokenising.

List of top words and their associated sentiments

It is commonly used technique in text analytics to compare individual words against a dictionary of sentiments to determine the positivity of a given text. We use four dictionaries below to measure sentiment, as there are currently no publicly available dictionary specially for macroeconomic or central bank analysis.

| Dictionary | Description | |------------|-------------| | AFINN | Scale of -5 (very negative) to +5 (very positive) | | Loughran | For analysis of financial statements, with multiple types of sentiments | | NRC | Crowd sourced emotion-based | | Bing | Labels as positive or negative |

From an extract of the top words with associated sentiments, we see that the sentiments associated with each dictionary is quite varied.

Table 1. Top words and their associated sentiments

# Compare how words categorization differs across sets
ecb_unigrams %>%
  count_ngrams %>%
  add_sentiments(c("afinn", "loughran", "nrc", "bing")) %>%
  compare_words_sentiment %>% 
  head(6)

Calculating a sentiment score

We then calculate a sentiment score by adding to the score when a positive word appears and subtracting when a negative word appears. In the case of the AFINN dictionary, we multiply the word count n by the sentiment score before summing them together. In Figure 2, we observe that each indicator is fairly different, suggesting that the dictionary has significant impact on your final results.

Using zero as a baseline, we observe that most of the dictionaries tend to give a positive overall sentiment to the minutes, but Loughran views the speeches negatively in most cases

High scoring words from the dictionary

To account for the differences, we extracted the top contributors to scores both negative and positive across the dictionaries, and see the scores are attributable to fairly different words.

ecb_scores <- ecb_unigrams %>%
  add_sentiments(all = TRUE) %>%
  group_by(date) %>%
  calc_sentiment_score(wt = "n")

g <- ecb_scores %>%  
  ggplot(aes(date, score, colour = method)) 
g1 <- g + geom_line() + 
  geom_hline(yintercept = 0) +
  facet_wrap(~method, ncol = 1, scales = "free_y") +
  theme(legend.position = "bottom") +
  ggtitle("Fig 1. Unweighted sentiment score")

g2 <- ecb_unigrams %>%
  add_sentiments(all = TRUE) %>%
  group_by(word) %>%
  calc_sentiment_score(wt = "n") %>%
  group_by(method) %>%
  top_n(10, abs(score)) %>%
  ungroup %>%
  mutate(word = reorder(word, score)) %>% 
  ggplot(aes(word, score, fill=factor(score > 0))) +
    geom_col(show.legend = FALSE) +
    facet_wrap(~method, scales = "free_y") +
    coord_flip() +
    ggtitle("Fig 2. Top contributors to sentiment")

grid.arrange(g1, g2, ncol=2)

Calculate weighted scores

As words like growth or inflation may be used very often in any ECB speech, we may want to place more weight on words that occur less often. To adjust for this, we use "term frequency - inverse document frequency" (tf-idf) as a weight. tf-idf measures how unique a particular word is in the other documents. If it is relatively unique, it is given a higher weight, while if it appears frequently in every single document, it would be given a smaller weight. Using the tf-idf as weights, we calculated a weighted score in Figure 3.

Adjusting for bigrams

Removing negation words

Unigram analysis may not be sufficient, as there are words which are used with opposite intentions, such as "negative growth" or "no money". Hence we extend the analysis to bigrams, which are essentially words paired together with their adjacent words.

To adjust the negated words, we have to first remove its original impact, then add the new impact, which is equivalent to adding twice its new established impact. From Figure 4, we observe that negation has had very small impact on the overall sentiment score, which suggests that in central bank speeches, negation terms are not significantly used.

unigrams_wt <- ecb_unigrams %>%
  group_by(date) %>%
  count_ngrams %>%
  bind_tf_idf(word, date, n) %>%
  ungroup

weighted_scores <- unigrams_wt %>%
  add_sentiments(all = TRUE) %>%
  group_by(date, method) %>%
  calc_sentiment_score(wt = "tf_idf") 

combined_scores <- bind_rows(ecb_scores %>% mutate(weight = "unwt"),
                            weighted_scores %>% mutate(weight = "wt"))

# Centre all scores and plot weight and unweighted on the same chart
g3 <- combined_scores %>%
  group_by(method, weight) %>%
  mutate(score = scale(score)) %>%
  ggplot(aes(date, score, colour = weight)) +
  geom_line() +
  facet_wrap(~method, ncol = 1, scales = "free_y") +
  geom_hline(yintercept = 0, lwd=0.3, lty = 5) +
  ggtitle("Fig 3. Frequency weighted sentiment scores (standardized)") +
  theme(legend.position = "bottom")

bigrams_separated <- df_ecb %>%
  make_ngrams(2, remove_stop_words = FALSE)

negation_words <- c("not", "no", "never", "without", "negative", "weak")

negated_words <- bigrams_separated %>%
  filter(word1 %in% negation_words) %>%
  ungroup()

##################
# adjust weighted scores
##################

# Calculate negative scores from negated words
negative_scores_by_word <- negated_words %>% 
  rename(n2 = n, word = word2) %>%
  inner_join(unigrams_wt, by = c("date", "word")) %>%
  mutate(tf_idf2 = (n2 * 2 * -1)/n * tf_idf) %>%  # We subtract twice to adjust for the "wrong" impact on the raw score
  add_sentiments(all = TRUE) %>%
  group_by(date, word, word1) %>%
  calc_sentiment_score(wt = "tf_idf2") 

# Total negative scores by date
negative_scores <- negative_scores_by_word %>%
  group_by(date, method) %>%
  summarise(score = sum(score))

# Join negative scores to original weighted scores, and recalculated combined scores
bigram_adjusted_scores <- negative_scores %>%
 # select(-negative, -positive) %>%
  rename(score2 = score) %>%
  right_join(weighted_scores, by = c("date", "method")) %>%
  mutate(score3 = ifelse(is.na(score2), score, score + score2))

# Plot combined weighted scores scores
g4 <- bigram_adjusted_scores %>% select(date, method, score, score2, score3) %>%
  rename(unadjusted = score, negation = score2, adjusted = score3) %>%
  gather(score_type, scores, -date, -method) %>%
  ggplot(aes(date, scores, colour = score_type)) +
  geom_line() +
  facet_wrap(~method, ncol=1, scales = "free_y") +
  geom_hline(yintercept = 0, lwd=0.3, lty = 5) + 
  ggtitle("Fig 4. Negation adjusted weighted score") +
  theme(legend.position = "bottom") 

grid.arrange(g3, g4, ncol = 2)

Predictive ability of sentiment

To test our sentiment scores, we compared the indicators against 1 day performance of Euribor and bond futures on the day of the announcement.

Performance of the sentiment indicators as market indicators is poor. If we look at the correlation between each of the indicators and the performance of the assets, the correlation is consistently less than 10%

Fig 5. Correlation between weighted indicators and 1 day return on assets

bonds_raw <- read.csv("data/asset_data.csv")

bonds_data <- bonds_raw %>%
  rename(ER1 = ER1.Comdty, ER6 = ER6.Comdty, schatz = DU1.Comdty, 
         bobl = OE1.Comdty, bund = RX1.Comdty, eurusd = EURUSD.Curncy) %>%
  select(-ER6) %>%
  mutate_at(.vars = vars(-Date), .funs = funs(ret = lead(.,0)/lag(., 1)-1)) %>%
  mutate(date = mdy(Date)) %>%
  select(date, ends_with("ret"))

all_data <- bigram_adjusted_scores %>%
  group_by(method) %>%
  mutate(adj_score = score3) %>%
  select(date, method, adj_score) %>%
  mutate(adj_score = c(NA,diff(adj_score))) %>%    # Measure changes in sentiment between speeches
  spread(method, -date) %>%
  left_join(bonds_data, by = c("date")) %>%
  na.omit

all_data %>%
  ungroup %>%
  select(-date) %>%
  ggpairs

Limitations of study

There are several limitations in this exploratory study of text sentiment analysis.

  1. The first, and biggest issue is the dictionaries used. As macroeconomics and central bank speech can have very unique terminologies, or against more generalized financial terms. As there are no readily available dictionaries for macroeconomic analysis at the moment, having one could improve performance significantly.

  2. Market movements are a result of market expectations vs central bank stance, hence instead of levels of positivity in the speech, it may be more useful to compare relative levels of positivity between news and market commentary leading up to the meeting, and the eventual tone of the meeting itself. However, much more data will need to be collected to discern market sentiment.

Evaluation

There is still a long way to go for us to effectively use text analytics to generate signals. However, if the limitations can be improved upon, there is potential for the indicator to help glean objective information from news sources or central bank speeches.

Text sentiment analysis can also be applied in different ways such as picking out words that might be associated with hawkish or dovish moves by comparing words used to meetings where measures were announced.



yunching/tidymas documentation built on Feb. 5, 2023, 1:42 p.m.