knitr::opts_chunk$set(warning = FALSE, message = FALSE, fig.width = 6, fig.height = 4, fig.align = "center")
In this application, we create a regular and a sentiment-adjusted version of the well-known news-based Economic Policy Uncertainty (EPU) index. It fits very well within the package's framework. Besides, the actual EPU index from January 1985 up to July 2018 for the U.S. is part of the package, see sentometrics::epu
.
library("sentometrics") library("data.table") library("quanteda")
By a regular EPU index, we mean to largely follow the core methodology outlined here.
We load and transform the built-in corpus of 4145 U.S. news articles between 1995 and 2014 into a sento_corpus
object. We only include the journal features, indicating article source (one of The Wall Street Journal or The Washington Post).
data("usnews") corpus <- sento_corpus(usnews[, c("id", "date", "texts", "wsj", "wapo")])
sento_lexicons
objectWe take the original keywords used to pinpoint coverage about the economy (E), policy concerns (P) and uncertainty (U), and organize them into a sento_lexicons
object.
keywords <- list( E = c("economy", "economic"), P = c("congress", "legislation", "white house", "regulation", "deficit", "federal reserve"), U = c("uncertainty", "uncertain") ) keywords_dt <- lapply(keywords, function(kw) data.table(x = kw, y = 1)) lex <- sento_lexicons(keywords_dt)
These keywords lexicons are used in the compute_sentiment()
function. For each EPU dimension, the obtained scores represent the number of keywords present in a given news article.
s <- compute_sentiment(corpus, lex, "counts") s[, -c("date", "word_count")]
sentiment
objectAs we are not interested in the number of keywords, but only in whether a keyword was present, we need some adjustments. In particular, we split the sentiment
object and compute an EPU column per newspaper, with a score of 1 if at least two categories' keywords show up, and 0 if not.
sA <- s[, 1:3] sB <- s[, -c(1:3)] to_epu <- function(x) as.numeric(rowSums(x > 0) >= 2) # >= 3 is too strict for this corpus sB[, "EPU--wsj" := to_epu(.SD), .SDcols = endsWith(colnames(sB), "wsj")] sB[, "EPU--wapo" := to_epu(.SD), .SDcols = endsWith(colnames(sB), "wapo")] s2 <- as.sentiment(cbind(sA, sB[, c("EPU--wsj", "EPU--wapo")]))
This new sentiment
object is aggregated into a monthly average and a 12-monthly linear moving average time series per newspaper. Setting howDocs = "equal_weight"
and do.ignoreZeros = FALSE
normalizes the monthly values by the combined number of news articles in that month.
w <- data.frame("simple" = c(rep(0, 11), 1), "linear" = weights_exponential(12, alphas = 10^-10)[, 1]) ctr <- ctr_agg(howDocs = "equal_weight", do.ignoreZeros = FALSE, howTime = "own", by = "month", lag = 12, weights = w) sm <- aggregate(s2, ctr)
The next step is to scale the newspaper-level EPU time series to unit standard deviation before a certain date (in this case, before 2005). Rather than unit standard deviation, we standardize to a standard deviation of 100.
dt <- as.data.table(subset(sm, date < "2005-01-01")) sds <- apply(dt[, -1], 2, sd) sm2 <- scale(sm, center = FALSE, scale = sds/100) subset(sm2, date < "2005-01-01")[["stats"]]
To then obtain the actual EPU index, the newspaper-level series are averaged by reapplying the aggregate()
function. We are left with two series, one a moving average of the other.
sm3 <- aggregate(sm2, features = list(journals = c("wsj", "wapo"))) plot(sm3, "time")
The scale()
function can be used again to achieve final normalizations. For the sake of the example, we shift the indices upwards by a value of 50.
sm4 <- scale(sm3, center = rep(-50, nmeasures(sm3)), scale = FALSE) plot(sm4)
A sentiment-adjusted EPU index adds a layer of news sentiment analysis to the typical creation process. The resulting index will fluctuate between negative and positive values depending on how news writes about topics related to economic policy uncertainty, rather than only analyzing if they write about it.
We start from scratch by reinitializing the corpus.
corpus <- sento_corpus(usnews[, c("id", "date", "texts", "wsj", "wapo")])
We move forward by adding binary features to the corpus for the E, P and U keywords defined earlier. Next, we compute an EPU feature, based on the same mechanism using the self-created to_epu()
function.
corpus <- add_features(corpus, keywords = keywords, do.binary = TRUE) dv <- as.data.table(docvars(corpus)) dv[, EPU := to_epu(.SD), .SDcols = c("E", "P", "U")]
Having detected the news articles to count in the EPU index, we need to normalize these counts. We do so per newspaper using some data.table
magic, and then add the appropriate features to the corpus using add_features()
.
# compute total number of articles per journal and month totArticles <- dv[, date := format(date, "%Y-%m")][, lapply(.SD, sum), by = date, .SDcols = c("wsj", "wapo")] setnames(totArticles, c("wsj", "wapo"), c("wsjT", "wapoT")) dv <- merge(dv, totArticles, by = "date") dv[, c("wsj", "wapo") := list((wsj * EPU) / wsjT, (EPU * wapo) / wapoT)] for (j in which(colnames(dv) %in% c("wsj", "wapo"))) # replace NaN and Inf values due to zero division set(dv, which(is.na(dv[[j]]) | is.infinite(dv[[j]])), j, 0) corpus <- add_features(corpus, featuresdf = dv[, c("wsj", "wapo", "EPU")])
We continue with a subsetted corpus carrying those articles discussing enough EPU material. We clean the features keeping only the normalized newspaper-level features.
corpus <- corpus_subset(corpus, EPU == 1) docvars(corpus, c("E", "P", "U", "EPU")) <- NULL
The news sentiment layer is added by applying the all-at-once sentiment computation and aggregation function sento_measures()
. The popular Harvard General Inquirer is the sentiment lexicon at service. Averaging across the newspaper series gives a final EPU index. Specific scaling as shown before is left aside.
sentLex <- sento_lexicons(sentometrics::list_lexicons[c("GI_en")]) ctr <- ctr_agg("counts", "equal_weight", "equal_weight", by = "month", lag = 12) sm <- sento_measures(corpus, sentLex, ctr) sm2 <- aggregate(sm, features = list(journals = c("wsj", "wapo"))) plot(sm2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.