knitr::opts_chunk$set(
  echo=TRUE, message = FALSE, warning = FALSE, autodep = TRUE,
  out.width = '600px', out.height = '300px'
  )

![Caltech Logo][caltechlogo]

library(twitterreport)
# Loading the data
data(senators)
data(senators_profile)
data(senate_tweets)

This document presents graphs and tables created with the package twitterreport. You can see the source code of this webpage at http://github.com/gvegayon/twitterreport.

The data used to ilustrate how the package works corresponds to the last 200 twitter messages emited by US senators retrieved through the twitter REST API using the function tw_api_get_statuses_user_timeline.

General stats

Most popular #hashtags

This graph shows the 5 most popular hashtags in this network through time including the number of times that these were mentioned in total (the table includes more).

# Creating stats (and data)
elements <- tw_extract(senate_tweets$text)

# Creating a xts type class object and using its plot method (tw_Class_ts)
hashtags <- tw_timeseries(elements$hashtag,senate_tweets$created_at)
plot(hashtags)
# Table (default using hashtags)
twtab <- tw_table(elements)
plot(twtab,caption='Most popular hashtags')

Most popular @users

This graph shows the set of 5 of the most popular users. Just like the previous stats, while the graph shows the number of mentions through time, the table shows the total number of mentions.

# Graph
mentions <- tw_timeseries(elements$mention,created_at = senate_tweets$created_at)
plot(mentions)
# Creating table for users
plot(tw_table(elements,'mention'),caption='Most popular users')

Mention Networks

This is the graph of conversations between US senators. Colored by party (light blue are democrats, blue are republicans and orange is independent), the thickness of the edges (links) represent the number of times that one senator mentions the other. Notice that interestingly Democrats and Republicans tend to group around while Sen. Angus King (only independent in the graph) is right in between the two groups.

In this case I consider two senators connected iff there one of them appears at least 3 times in the other senator's status timeline.

tweets_components <- tw_extract(senate_tweets$text)
groups <- data.frame(
  name      = senators_profile$tw_screen_name,
  group     = factor(senators$party),
  real_name = senators$Name,
  stringsAsFactors = FALSE)
groups$name <- tolower(groups$name)

senate_network <- tw_network(
  tolower(senate_tweets$screen_name),
  lapply(tweets_components$mention,unique),only.from = TRUE,
  group=groups, min.interact = 3)

plot(senate_network, nodelabel='real_name')

Word cloud

The function tw_words takes a character vector (of tweets for example) and extracts all the stopwords+symbols. And the plot method for its output creates a wordcloud

plot(tw_words(senate_tweets$text), max.n.words = 50)

Identifying individuals gender

Using english and spanish names, the tw_gender function matches the character argument (which can be a vector) with either a male or female name (or unidentified).

# Getting the names
sen <- tolower(senators_profile$tw_name)
sen <- gsub('\\bsen(ator|\\.)\\s+','',sen)
sen <- gsub('\\s+.+','',sen)

otab <- table(tw_gender(sen))
tab <- as.data.frame(otab)
colnames(tab) <- c('Gender','count')

if (requireNamespace("ggvis", quietly=TRUE)) {
  library(ggvis)
  tab %>% ggvis(~Gender, ~count) %>% layer_bars()
} else {
  barplot(otab)
}

Sentiment analysis

Here we have an example clasifying senate tweets on #irandeal. Again, using ggvis we plot an histogram to show the output

irandeal <- subset(senate_tweets, grepl('irandeal',text, ignore.case = TRUE))
irandeal$sentiment <- tw_sentiment(irandeal$text, normalize = TRUE)

if (requireNamespace("ggvis", quietly=TRUE)) {
  library(ggvis)
  irandeal %>% ggvis(~sentiment) %>% layer_histograms()
} else {
  hist(irandeal$sentiment)
}

A map using leaflet

The function tw_leaflet provides a nice wrapper for the function leaflet of\ the package of the same name. Using D3js, we can visualize the number of tweets grouped up geographically as the following example shows:

# Getting the data
data(senate_tweets)

# Aggregating until get only 3 big groups
tw_leaflet(senate_tweets,~coordinates, nclusters=4,radii = ~sqrt(n)*3e5)

Note that in this case there are 14 tweets with the coordinates column non-empty, leading to 4 different senators that have such information. Using the nclusters option, the tw_leaflet groups the data using the hclust function of the stats package. So the user doesn't need to worry about aggregating data.

[caltechlogo]: r system.file(package='twitterreport')/fig/Caltech_LOGO-Orange_RGB_10pc.png "Caltech"



gvegayon/twitterreport documentation built on May 17, 2019, 9:30 a.m.