# Basic knitr options
library(knitr)
opts_chunk$set(comment = NA, 
               echo = FALSE, 
               warning = FALSE, 
               message = FALSE, 
               error = TRUE, 
               cache = FALSE,
               fig.path = 'figures/')
# Libraries
library(vilaweb)
library(rtweet)
library(tidyverse)
library(databrew)
library(translateR)
library(sentimentr) # https://github.com/trinker/sentimentr
require(RPostgreSQL)
require(readr)  
require(DBI)
library(webshot)
source('prepare_data.R')
# Utility for screenshotting tweets:
# https://github.com/lukehorvat/screenshot-tweet

Summary

The question

Did Jordi Cuixart organize a public and violent uprising against the Spanish State?

The methods

We explore organically generated data from twitter, google, wikipedia, and newspapers to try to answer the above question.

The hypothesis of guilt

If Jordi Cuixart committed a public and violent uprising, we expect the following three conditions to be true:

  1. We expect that social media chatter (number of tweets) mentioning violence and/or Jordi Cuixart, particularly from the ideological adversaries of Cuixart (politicians opposed to Catalan independence) will be highest in the immediate aftermath of the violence. By the same token, his entry into prison will not be surprising (given the severity of the acts committed), and should therefore generate relatively less tweets than the acts themselves.

  2. We expect that wikipedia page views and google searches about Cuixart will be highest in the immediate aftermath of the violence. By the same token, his enty into prison will not be surprising (given the severity of the violent acts committed), and should therefore generate relatively less searches and page views than immediately following the violent acts themselves.

  3. We expect that newspaper's tweets about Cuixart will be hightest in the immediate aftermath of the violence, since a violent uprising is a very "newsworthy" event. Given the severity of the events, his entry into prison should be unsurprising, and therefore should generate less news coverage than the violent events.

The results

  1. Hypothesis 1: REJECTED. Social media chatter about violence and Cuixart does not peak immediately after the supposed violent uprising, but rather after the entry into prison.

  2. Hypothesis 2: REJECTED. Wikipedia page views and google searches about Cuixart do not peak immediately after the supposed violent uprising, but rather after the entry into prison.

  3. Hypothesis 3: REJECTED. Newspaper coverage about Cuixart does not peak immediately after the supposed violent uprising, but rather after the entry into prison.

Conclusion

A public and violent uprising, were it to have taken place, would be a major, attention-worthy event. In the case of Jordi Cuixart, however, attention from social media, wikipedia, search engines, and traditional Spanish and international media did not follow the pattern of a truly violent event.

Rather, online attention to Cuixart peaked with the Judiciary's decision to imprison him. In other words, imprisoning Cuixart was considered more attention-worthy and/or surprising than his supposed crime. This is inconsistent with an act as severe as rebellion - given the severity of the supposed act, we would reasonably expect the greatest amount of media chatter to take place immediately afterwards, not a month later. Also, given the severity of the act, we would expect the imprisonment of the act's author to be only a minor newsworthy event. Both of these expectations are refuted by the data.

The data regarding the events of September/October 2017 are not consistent with a violent crime. The fact that Mr. Cuixart's imprisonment garnered more attention than his supposed actions are also not suggestive of a violent crime, but rather of disproportionate imprisonment.

Analysis

Finding 1: Politicians' Twitter data does not indicate violence.

A.

Social media chatter about violence and Jordi Cuixart, even when limited to the ideological adversaries of Cuixart (politicians opposed to Catalan independence) does not peak in the immediate days following the supposed crime.

n_people <- length(unique(people_tweets$username))
n_tweets <- nrow(people_tweets)

We harvested tweets from r n_people well-known Spanish/Catalan politicians and political parties/groups during the supposed violent uprising organized by Cuixart (September 20th, 2017), as well as the following 2 days (September 21, 2017 and September 22nd, 2017). The list of the r n_people politicians was intentionally made up of politicians who are known to be against the independence of Catalonia. The r n_people are listed below:

cat(paste0(sort(unique(people_tweets$username)), collapse = '\n'))

During the 3 days analyzed (Sep 20-22, 2017), the r n_people accounts generated r n_tweets tweets.

pd <- people_tweets %>%
  mutate(is_cuixart = grepl('cuixart', tolower(tweet)),
         is_jordis = grepl('jordis', tolower(tweet)) &
           !grepl('jordisa', tolower(tweet)),
         is_omnium = grepl('omnium', tolower(tweet)),
         is_violence = detect_violence(tolower(tweet)),
         is_rebelion = grepl('rebel', tolower(tweet)),
         is_sedition = grepl('sedici', tolower(tweet)),
         is_alzamiento = grepl('alzamiento|alzar', tolower(tweet)),
         is_tumulto = grepl('tumult', tolower(tweet))) %>%
  mutate(is_rebsed = is_rebelion | is_sedition)

show_screenshots <- function(the_ids,
                             the_data = picture_df){
  out <- 
    the_data %>%
    filter(id %in% the_ids)
  if(nrow(out) > 0){
    for(i in 1:nrow(out)){
      cat(paste0('\n*', as.character(out$the_dates[i]), ' ', as.character(out$the_times[i]), ' ', as.character(out$the_timezones[i]), '*\n'))
      cat(paste0('\n![screenshots/', out$file[i],'](screenshots/', out$file[i], ')\n'))
    }
  }
}

Of these r n_tweets tweets:

# show_screenshots(the_ids = pd$id[pd$is_cuixart])
show_screenshots(the_ids = pd$id[pd$is_jordis])
show_screenshots(the_ids = pd$id[pd$is_omnium])
show_screenshots(the_ids = pd$id[pd$is_alzamiento])
show_screenshots(the_ids = pd$id[pd$is_tumulto])
kable(pd %>% filter(is_violence) %>% dplyr::select(date, time, username, tweet))
# show_screenshots(the_ids = pd$id[pd$is_violence])
# show_screenshots(the_ids = pd$id[pd$is_rebsed])
kable(pd %>% filter(is_rebsed) %>% dplyr::select(date, time, username, tweet))

In other words, despite more than 1,000 tweets in the 3 day period from these politicians - who are all very critical of Mr. Cuixart's policy aims - none of them mentioned him participating in, organizing, promoting, or carrying out any violent acts. The only mention of a potential crime related to Cuixart is a news story about the Fiscalía, not about the actual violence in any events which occurred. If there was a violent rebellion, why did none of these politicians mention it?

B.

During actual violent events, the use of the words "violent" or "violence" corresponds closely with the actual occurrence of violent events.

The below shows tweets with any word containing the stem "violen" (ie, "violencia", "violento", "violenta", etc.) from the same group of politicians during the period immediately before and after the events for which Mr. Cuixart is being criminally charged. It is clear that there is no significant increase in chatter about violence on or immediately after the dates of the supposed violence.

plot_data <- people_tweets_long %>%
  mutate(is_violence = detect_violence(tolower(tweet))) %>%
  filter(is_violence) %>%
  group_by(date, username) %>%
  tally
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2017-09-10',
                    date <=' 2017-10-20')
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish politicians with violence-related words',
       subtitle = 'Words: "Violence, violent". Red line at date of supposed violence.') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
  geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

(The September 28th peak is largely due to tweets about an agreement on gender violence.)

Compare the above with the frequency of these words (in English) during a truly violent event. In August 2017, violent protests took place in Charlottesville, South Carolina (USA). White supremacists marched, and many counter-protestors also convened. There were violent clashes between the two groups and on August 12th a man drove his car into a crowd, killing one person.

The below shows the frequency of words with the root "violen" ("violence", "violent", etc.) from a random sample of US politicians (congressmen). The congressmen are listed below:

cat(paste0(sort(unique(usa$username)), collapse = '\n'))

Note that, unlike in the previous chart, the peak in violence-related chatter corresponds perfectly with the actual date of violent events.

pd <- usa %>%
  mutate(is_violence = grepl('violen', tolower(tweet))) %>%
  filter(is_violence) %>%
  # Adjust for time zone
  mutate(date = as.numeric(date)) %>%
  mutate(date = ifelse(as.numeric(substr(time, 1, 2)) <6,
                       date - 1,
                       date)) %>%
  mutate(date = as.Date(date, origin ='1970-01-01')) %>%
  group_by(date,username) %>%
  tally
left <- expand.grid(date = seq(min(pd$date),
                              max(pd$date),
                              by = 1),
                   username = sort(unique(pd$username)))
pd <- left_join(left, pd)
pd <- pd %>% filter(date >= '2017-08-01',
                    date <=' 2017-08-30')
ggplot(data = pd,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
  # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_x_date(breaks = unique(pd$date)) +
  labs(x = 'Date',
       y = 'Views',
       title = 'Tweets by US Congressmen with violence-related words',
       subtitle = 'Words: "Violence, violent". Red line at August 12.') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(pd$username)))) +
  geom_vline(xintercept = as.Date('2017-08-12'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

When true violence occurs, politicians notice. And they tweet about it immediately. Why did none of these politicians tweet about the violence carried out by Mr. Cuixart in the immediate days afterwards?

C.

Social media chatter about Cuixart was much higher at the time of the entry into prison than at the time of the supposed violent events.

The below shows tweets from the same group of politicians mentioning the name of Mr. Cuixart.

plot_data <- people_tweets_long %>%
  mutate(is_jordi = grepl('cuixart', tolower(tweet))) %>%
  filter(is_jordi) %>%
  group_by(date, username) %>%
  tally
left <- expand.grid(date = seq(min(people_tweets_long$date),
                              max(people_tweets_long$date),
                              by = 1),
                   username = sort(unique(people_tweets_long$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2017-09-10',
                    date <=' 2017-10-20')
plot_data$n[is.na(plot_data$n)] <- 0
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish politicians with the word "Cuixart"',
       subtitle = 'Red line at date of supposed violence.') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
  geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

Clearly, despite him carrying out a rebellion, none of the politicians in question even tweeted his name. Not once. How is it that Mr. Cuixart was able to carry out a "public" and "violent" uprising without any politicians noticing?

Finding 2: Wikipedia and Google data do not indicate violence

A.

Wikipedia data about Cuixart does not peak immediately after the supposed violent uprising, but rather after the entry into prison.

In the case of public, violent events, citizens react by searching for information on the perpetrators of the violence. They use search engines (google) and online encyclopedias (wikipedia) to learn more about the criminals.

The below chart shows all wikipedia page views for Jordi Cuixart and Jordi Sànchez in the period immeidately prior to and after the supposed violent uprising. Note that the major increase in views does not come at the time of the events in question, but rather around October 16th - when they are placed in preventive prison.

wiki <- pv %>%
  filter(person %in% c('Jordi Cuixart', 'Jordi Sànchez')) %>%
  filter(date >= '2017-09-15',
         date <= '2017-10-20')

pd <- wiki %>%
    group_by(date,person) %>%
  summarise(n = sum(views, na.rm = TRUE))
left <- data_frame(date = seq(min(pd$date),
                              max(pd$date),
                              by = 1))
pd <- left_join(left, pd)

ggplot(data = pd,
       aes(x = date,
           y = n,
           group = person)) +
  geom_bar(stat = 'identity',alpha = 0.6) +
  # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 16)) +
  scale_x_date(breaks = unique(pd$date)) +
  labs(x = 'Date',
       y = 'Views',
       title = 'Wikipedia page-views for Jordi Cuixart and Jordi Sànchez',
       subtitle = 'Spanish, Catalan, and English languages. Red line at date of supposed violence.') +
  facet_wrap(~person, scales = 'free_y',
             ncol = 1) +
  geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

In other words, the public did not find the events of September 20-21 to be sufficiently noteworthy to merit much attention. However, they did find the events of October 16 to be notworthy. In other words, there appears to be more interest/surprise in the judicial process than in the actual events for which the judicial process is based.

B.

Wikipedia page views for the organizers of truly violent events increase rapidly in the period immediately after the violence.

Contrast the above with the below chart, showing wikipedia page-views for the organizers of the violent Charlottesville protests in August 2017.

wiki <- pv_char %>%
  # filter(person == 'Jordi Cuixart') %>%
  filter(date >= '2017-08-01',
         date <= '2017-08-31')

pd <- wiki %>%
    group_by(date,person) %>%
  summarise(n = sum(views, na.rm = TRUE))
left <- data_frame(date = seq(min(pd$date),
                              max(pd$date),
                              by = 1))
pd <- left_join(left, pd)

ggplot(data = pd,
       aes(x = date,
           y = n)) +
  geom_bar(stat = 'identity',alpha = 0.6) +
  # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(pd$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Wikipedia page-views for Jason Kessler and Richard Spencer',
       subtitle = 'English language. Red line at date of violence.') +
  facet_wrap(~person,
             scales = 'free_y',
             ncol = 1) +
  geom_vline(xintercept = as.Date('2017-08-12'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

In the above chart, the correlation between acts of violence and interest in the organizers of the violent events is very tight. Acts of violence took place on August 12th, and interest in the organizers peaked in the immediate days following these acts of violence. This is significantly different from the interest profile for Mr. Cuixart.

C.

The frequency of Google serches for Cuixart does not peak immediately after the supposed violent uprising, but rather after the entry into prison.

Google is much more frequently used than Wikipedia, and therefore offers a larger sample. The below shows weekly searches for Jordi Cuixart (as a percentage relative to the max interest). The date on the x axis reflects the last day in the week in question.

google_chart <- function(person = 'Jordi Cuixart',
                         dates = c('2017-07-01', '2017-12-30'),
                         red = NULL){
  pd <- gt %>%
    filter(keyword == person) %>%
    filter(date >= dates[1],
           date <= dates[2])
  g <- ggplot(data = pd,
         aes(x = date,
             y = hits)) +
    # geom_line() +
    geom_bar(stat = 'identity') +
    theme_databrew() +
    labs(x = 'Week ending on',
         y = 'Relative search load (%)',
         title = paste0('Google search trends for ', person)) +
    scale_x_date(breaks = unique(pd$date)) +
    theme(axis.text.x = element_text(angle = 90))
  if(!is.null(red)){
    g <- g + 
      geom_vline(xintercept = as.Date(red),
             color = 'red',
             alpha = 0.7,
             lty = 2)
  }
  return(g)
}

google_chart('Jordi Cuixart', red = '2017-09-23')

During the week of the supposed uprising, interest in Cuixart was only 4% of the interest during the week of his entry into prison. How could it be that he carried out a violent uprising, but the public did not find him sufficiently interesting to google him?

Incidentally, the interest curve for Jordi Sànchez is very similar. Clearly, the public did not find the events of Sep 20-21 to be sufficiently concerning or worrying to merit much attention.

google_chart('Jordi Sànchez', red = '2017-09-23')

D.

Google searches for the organizers of truly violent events increase rapidly in the period immediately after the violence.

Contrast the above - a case of the public being more interested in judicial events than a supposedly violent uprising - with the below (a case of genuine interest in real violent events).

Interest for Jason Kessler, one of the organizers of the Charlottesville protest, peaked during the violence.

google_chart('Jason Kessler',
             dates = as.Date(c('2017-07-01',
                               '2017-09-30')), red = '2017-08-12')

By the same token, interest for Richard Spencer, another of the organizers, peaked during the violence.

google_chart('Richard Spencer',
             dates = as.Date(c('2017-07-01',
                               '2017-09-30')), red = '2017-08-12')

The interest curves in the above charts are markedly different for Jordi Cuixart and Jordi Sànchez compared with Jason Kessler and Richard Spencer. In the case of the former two, the events which they organized garnered very little attention from the public. Apparently, the public did not find their events to be interesting, concerning, or newsworthy. On the other hand, interest in the organizers of the Charlottesville protest coincided with the protest themselves. The violence of the protest motivated the public to carry out more searches.

The different in these curves suggests a significant difference in the reality of the events they reflect. The curves for Kessler and Spencer are suggestive of real violence. The curves for Cuixart and Sànchez are not.

Finding 3: Newspaper Twitter data does not indicate violence

A.

Newspaper coverage about Cuixart does not peak immediately after the supposed violent uprising, but rather after the entry into prison.

n_papers <- length(newspapers)
n_international <- length(tolower(news$user_name))

We harvested tweets for r n_papers Spanish newspapers. We intentionally restricted our analysis only to those newspapers with a known anti-independence editorial position, so as to increase the likelihood of finding a "signal" in the "noise". The newspapers are listed below:

cat(paste0(sort(unique(newspaper_tweets$username)), collapse = '\n'))

As a comparison, we also harvested tweets for r n_international international newspapers, listed below:

cat(paste0(sort(unique(news$user_name)), collapse = '\n'))

The below chart shows the use of the words "violent" and "violence" in tweets from the Spanish newspapers. The vertical red line shows the date of the supposed violent uprising organized by Jordi Cuixart and Jordi Sànchez.

Clearly, there is no peak in tweets about violence during or after the events for which Mr. Cuixart is being charged.

plot_data <- newspaper_tweets %>%
  mutate(is_violence = detect_violence(tolower(tweet))) %>%
  # filter(is_violence) %>%
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2017-09-10',
                    date <=' 2017-10-20')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent". Red line = 20 September 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

We can restrict the analysis further by showing only tweets which contained a violence-related term (violent or violence) AND a mention of Spain, Barcelona, or Catalonia. The below shows this filtering. Again, the mentions of violence are much higher on other dates than on the dates immediately after the protest for which Cuixart is being charged.

plot_data <- newspaper_tweets %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('españ|espan|catal|barcelon', tolower(tweet))) %>%
  # filter(is_violence) %>%
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2017-09-10',
                    date <=' 2017-10-20')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent" and Barcelona, Catalonia, or Spain. Red line = 20 September 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)
# Show actual tweets
show <- newspaper_tweets %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('españ|espan|catal|barcelon', tolower(tweet))) %>%
 filter(is_violence) %>% filter(date>='2017-09-20', date<= '2017-09-22')

Furthermore, of the only r nrow(show) tweets actually mentioning violence during the 3 day period from Sep 20-22, none specifically reference Cuixart or his protest.

These tweets are shown below.

kable(show %>% dplyr::select(date, time, username, tweet))

The below shows the same time period, but for international news sources. Their mentions of violence do not increase during or immediately after the protests carried out by Mr. Cuixart.

plot_data <- tl %>%
  filter(date >= '2017-09-10',
         date <= '2017-10-20') %>%
  filter(username %in% tolower(news$user_name)) %>%
  mutate(is_violence = detect_violence(tolower(tweet))) %>%
  # filter(is_violence) %>%
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)

plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by international newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent". Red line = 24 July 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2018-07-24'),
             color = 'red',
             alpha = 0.7,
             lty = 2) +
  geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

Since international news sources cover many locations, we can restrict our analysis geographically. The below shows the same time period for international news sources, but restricting only to those tweets with a reference to Spain, Catalonia, or Barcelona. Note that there are no mentions of violence in the days in question.

plot_data <- tl %>%
  filter(date >= '2017-09-10',
         date <= '2017-10-20') %>%
  filter(username %in% tolower(news$user_name)) %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('españ|espan|catal|barcelon', tolower(tweet))) %>%  # filter(is_violence) %>%
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)

plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by international newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent" and "Spain", "Catalonia", or "Barcelona". Red line = 24 July 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2018-07-24'),
             color = 'red',
             alpha = 0.7,
             lty = 2) +
  geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

Is it conceivable that a public, violent uprising took place, and neither the Spanish nor international media thought that it was newsworthy?

Could it be that these newspapers simply do not cover violent protest stories? Let's take a look at other violent events and see.

B.

During actual violent events, the frequency of violence-related words corresponds chronologically with the violence.

If we look only at the use violence-words and a geographic reference to Charlottesville ("Charlottesville, Estados Unidos, EEUU, América"), we say that the newspapers' coverage of violence accurately tracks the real incidence of violent events.

plot_data <- tl%>%
  filter(username %in% newspapers) %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('charlott|estados unidos|eeuu|américa', tolower(tweet))) %>% 
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2017-08-01',
                    date <=' 2017-08-30')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent" and geo-reference to Charlottesville. Red line = 12 August 2017') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2017-08-12'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

The above demonstrates that these papers do tweet in a timely fashion on violent protests. Why then, did they not tweet about Mr. Cuixart's protest? Could it be that there was no significant violence?

Let's examine the same period, but for international newspapers (those papers which did not tweet about violence in Spain/Catalonia/Barcelona at all from Sep 20-22, 2017). As per the above chart, the below chart shows violence-words which are geo-tagged relevantly (Charlottesville, USA, Virgina). Note that the frequency in violence-related tweets coincides with the actual violent events.

plot_data <- tl%>%
  filter(username %in% news$user_name) %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('charlott|usa|virgina', tolower(tweet))) %>% 
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2017-08-01',
                    date <=' 2017-08-30')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by international newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent" and geo-reference to Charlottesville. Red line = 12 August 2017') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2017-08-12'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

As another point of comparison, the below shows the Spanish newspapers' frequency of tweets with violence-related words during the July 2018 Barcelona taxi strike.

plot_data <- tl%>%
  filter(username %in% newspapers) %>%
  mutate(is_violence = detect_violence(tolower(tweet))) %>% 
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2018-07-13',
                    date <=' 2018-08-08')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent". Red line = 24 July 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2018-07-24'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

The below chart is identical to the above, but filtering only for those tweets which contain both a violence word ("violencia", "violento") and a relative geographical tag ("España", "Cataluña", "catalán", "catalana", "Barcelona").

plot_data <- tl%>%
  filter(username %in% newspapers) %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('españ|espan|catal|barcelon', tolower(tweet))) %>% 
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2018-07-13',
                    date <=' 2018-08-08')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent". Red line = 24 July 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2018-07-24'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

It is clear in the above that there is a notable peak in violence words on the date of actual violence. In both the taxi protests, and Charlottesville protests, these newspapers increased the frequency of tweets with violence-words. Why did they not do so after Mr. Cuixart's protest, if it was indeed a "violent" and "public" uprising?

As a final point of comparison, let's examine the protests which took place on October 1, 2018 (one year after the referendum). These same newspaper, which tweeted violence-words with a relevant geo-tag only 8 times during the 3 days during/after the period for which Cuixart faces charged, increased the frequency of violent terms significantly. From Oct 1 to Oct 3, 2018, violence-words were tweeted 55 times. Most of these tweets pertained to the protests at the Catalan Parlament (for which nobody is being charged with rebellion or sedition).

plot_data <- tl%>%
  filter(username %in% newspapers) %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('españ|espan|catal|barcelon', tolower(tweet))) %>% 
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2018-09-15',
                    date <=' 2018-10-15')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent". Red line = 1 October 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2018-10-01'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

If a violent rebellion took place during a protest September 20-21 2017, why was the rate of the word "violence" 700% higher during a protest on October 1-2 2018?

plot_data <- tl %>%
  filter(date >= '2017-09-01',
         date <= '2018-07-31') %>%
  filter(username %in% newspapers) %>%
  mutate(is_violence = detect_violence(tolower(tweet))) %>%
  # filter(is_violence) %>%
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))

n_t <- plot_data %>% group_by(date) %>% summarise(n = sum(n, na.rm = T)) %>% filter(date >= '2017-09-20', date <= '2017-09-22') %>% summarise(x = sum(n)) %>% .$x

# taxis
n_tt <- plot_data %>% group_by(date) %>% summarise(n = sum(n)) %>% filter(date >= '2018-07-24', date <= '2018-07-26') %>% summarise(x = sum(n)) %>% .$x
# During the three day period from September 20-22 2017, the period during and immediately after the violent public uprising, these 8 newspapers issued only a total of `r n_t` tweets with the words "violent" or "violence".

# On the other hand, during the three day period from July 24-July 26, 2018, the period of the Barcelona taxi drivers protest, these 8 newspapers issued `r n_tt` tweets with the words "violent" or "violence".

Let's take a look at another case: the 2016 Turkish military coup d'etat attempt. This was very much a violent and public uprising, one more consistent with what the Spanish Penal Code describes as "rebellion".

The below shows the frequency of the words "violent", or "violence" from international news twitter accounts during the period of the Turkish coup attempt in July 2016 (filtering only for words which contain a relevant geographical tag: "Ankara", "Istanbul", and/or "Turkey"/"Turkish"). Note that, like with Charlottesville protests and Barcelona taxi protests, the increase in tweets coincides with the actual events.

plot_data <- turkey %>%
  filter(is_international) %>%
  mutate(is_violence = detect_violence(tolower(tweet)) &
           grepl('turk|ankara|istanbu', tolower(tweet))) %>%
  # filter(is_violence) %>%
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n())
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)

plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by international newspapers with violence-related words',
       subtitle = 'Words: "Violence, violent" and "Turkey", "Turkish". Red line = 15 July 2016') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2016-07-16'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

C.

Tweets from news sources mentioning "Cuixart" during the "uprising" were low.

As the organizer of a public, violent uprising, we would expect a great deal of media coverage to mention the perpetrator of the crime in the immediate days following the crime. However, this is not the case.

The below shows tweets mentioning the name of Cuixart in the period immediately before and after the supposed violent uprising.

plot_data <- newspaper_tweets %>%
  mutate(is_cuixart = grepl('cuixart', tolower(tweet))) %>%
  filter(is_cuixart) %>%
  group_by(date,username) %>%
  tally
left <- expand.grid(date = seq(min(newspaper_tweets$date),
                              max(newspaper_tweets$date),
                              by = 1),
                   username = sort(unique(plot_data$username)))
plot_data <- left_join(left, plot_data)
plot_data <- plot_data %>% filter(date >= '2017-09-10',
                    date <=' 2017-10-20')
plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by Spanish newspapers about Cuixart',
       subtitle = 'Red line: 20 September 2018') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
     geom_vline(xintercept = as.Date('2017-09-20'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

In the above it is clear that Cuixart's behavior did not merit significant media attention in the immediate days following the protests. Rather, the media became more interested in Cuixart one month later, when he was sent to prison.

What is more interesting? A person carrying out a violent and public uprising? Or a person being sent to prison for it?

Clearly, the former should be more interesting. The latter is only more interesting when it is a surprise, that is, when the actions carried out do not correspond with the public's expectation for the judicial reaction.

D.

Tweets from news sources mentioning the author of actual violent events corresponds more closely with the moment of violence.

Let's compare the above chart with tweets mentioning the organizers of a truly violent protest in Charlottesville. In the days following the protest, news outlets tweeted the names of the protest organizers multiple times.

plot_data <- tl %>%
  filter(username %in% tolower(news$user_name)) %>%
  mutate(is_violence = grepl('spencer|kessler', tolower(tweet))) %>%
    # Adjust for time zone
  mutate(date = as.numeric(date)) %>%
  mutate(date = ifelse(as.numeric(substr(time, 1, 2)) <6,
                       date - 1,
                       date)) %>%
  mutate(date = as.Date(date, origin ='1970-01-01')) %>%
 group_by(date,username) %>%
  summarise(n = length(which(is_violence)),
            denom = n()) %>%
  ungroup %>%
  filter(date >= '2017-08-01',
         date <= '2017-08-30')
left <- expand.grid(date = seq(min(plot_data$date),
                              max(plot_data$date),
                              by = 1),
                   username = sort(unique(tolower(news$user_name))))
plot_data <- left_join(left, plot_data)

plot_data$n <- ifelse(is.na(plot_data$n), 0, plot_data$n)
ggplot(data = plot_data,
       aes(x = date,
           y = n,
           fill = username)) +
  geom_bar(stat = 'identity',alpha = 0.9) +
    # geom_smooth(se = FALSE) +
  theme_databrew() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 15)) +
  scale_x_date(breaks = unique(plot_data$date)) +
  labs(x = 'Date',
       y = 'Tweets',
       title = 'Tweets by international newspapers with mention\nof "Spencer" and/or "Kessler"') +
  scale_fill_manual(name = '',
                    values = databrew::make_colors(n = length(unique(plot_data$username)))) +
   geom_vline(xintercept = as.Date('2016-07-16'),
             color = 'red',
             alpha = 0.7,
             lty = 2) +
  geom_vline(xintercept = as.Date('2017-08-12'),
             color = 'red',
             alpha = 0.7,
             lty = 2)

Conclusion

Did Jordi Cuixart carry out a public and violent uprising on the night of September 20th, 2017? According to the data from Twitter, Google, Wikipedia, the answer is unequivocally not.

Rebellion is - by definition in the Spanish penal code - violent and public.

Violence is attention-worthy: when it occurs, politicians (particularly those opposed to the executors of the violence) tweet about it. Violence is news-worthy: when it occurs, newspapers write about it. Violence is interest-worthy: when it occurs, people search for the names of its organizers on google and wikipedia to learn more.

How is it, then, that Jordi Cuixart managed to carry out a violent and public uprising and almost nobody noticed at the time?

How is it, then, that in the immediate days after the uprising, the only references to "violence" from Jordi Cuixart's political opponents referred to other events, like simple assaults or gender violence?

How is it, then, that both the national and international media which were present during the protest of Sep 20-21 did not note levels of violence which were newsworthy?

How is it, then, that these same media tweet about violence in a timely fashion and at higher frequencies when violence occurs in other situations, but did not tweet about violence for Jordi Cuixart's supposed uprising?

How is it, then, that interest in Jordi Cuixart did not increase until he was sent to prison?

In real violent events, the patterns of social network, news, and search frequencies are similar: they peak rapidly in the immediate aftermath of the violence, and then decline thereafter This is the case for both violent-words ("violence" and "violent"), as well as for the organizers of violence (Kessler and Spencer). This is the case in the Barcelona taxi protests, the failed Turkish military coup, and the violent Charlottesville protests.

But for the events for which Jordi Cuixart is being charged, this is simply not the case. There is no notable peak in references to or searches of him, even by those most opposed to his policital aspirations. On the contrary, the peak does not come until later, when he is sent to prison. In other words, his entry into prison was considered more newsworthy than the supposed violent and public uprising.

Why would a violent criminal being sent to prison be considered so interesting? Why would the supposed crime be considered so uninteresting to the general public, the media, and politicians?

Could it be that the judiciary's actions were considered more shocking than Mr. Cuixart's?

The data surrounding the events of Mr. Cuixart's protest and entry in to prison show that there was significantly more surprise/interest in the latter event than the former. This is not consistent with a major, violent event. Rather, it is consistent with public surprise at a disproportionate decision to imprison an explicitly pacifist activist for having participated - among many others - in the organization of an explicitly protest.

The reason there was so little interest in and chatter about Cuixart immediately following the protest was because protest is, simply, normal. And the reason there was so much interest in and chatter about Cuixart immediately following his imprisonment was because imprisoning social activists is, simply, not normal.

Mr. Cuixart is not a violent rebel.

Appendix

https://github.com/joebrew/vilaweb/tree/master/analyses/sep20/appendix.md

# newspaper_df <- 
#   data_frame(file = dir('newspaper_headlines')) %>%
#   mutate(paper = unlist(lapply(lapply(strsplit(file, '_'), function(x){x[2]}), function(y){gsub('.jpg', '', y, fixed = T)}))) %>%
#   mutate(date = as.Date(substr(file, 1, 10))) %>%
#   mutate(file = paste0('newspaper_headlines/', file))
# 
# sub_newspapers <- newspaper_df %>%
#   filter((date >= '2017-09-21'&
#          date <= '2017-09-22'))
# sub_newspapers <- sub_newspapers %>%
#   arrange(date, paper)
# for(i in 1:nrow(sub_newspapers)){
#    this_file <- sub_newspapers$file[i]
#   new_person <- FALSE
#   if(i == 1){
#     new_person <- TRUE
#   } else if(sub_newspapers$date[i] != sub_newspapers$date[i-1]){
#     new_person <- TRUE
#   }
#   if(new_person){
#     cat(paste0('\n### ', as.character(sub_newspapers$date[i]), '\n\n'))
#   }
#   cat(paste0('\n##### ', as.character(sub_newspapers$date[i]), ', ',   as.character(sub_newspapers$paper[i]), '\n'))
#   cat(paste0('\n![',this_file,'](', this_file, ')\n'))
# }


joebrew/vilaweb documentation built on Sept. 11, 2020, 3:42 a.m.