knitr::opts_chunk$set( echo=TRUE, message = FALSE, warning = FALSE, autodep = TRUE, out.width = '600px', out.height = '300px' )
![Caltech Logo][caltechlogo]
library(twitterreport) # Loading the data data(senators) data(senators_profile) data(senate_tweets)
This document presents graphs and tables created with the package twitterreport
. You can see the source code of this webpage at http://github.com/gvegayon/twitterreport.
The data used to ilustrate how the package works corresponds to the last 200 twitter
messages emited by US senators retrieved through the twitter REST API using the function
tw_api_get_statuses_user_timeline
.
This graph shows the 5 most popular hashtags in this network through time including the number of times that these were mentioned in total (the table includes more).
# Creating stats (and data) elements <- tw_extract(senate_tweets$text) # Creating a xts type class object and using its plot method (tw_Class_ts) hashtags <- tw_timeseries(elements$hashtag,senate_tweets$created_at) plot(hashtags)
# Table (default using hashtags) twtab <- tw_table(elements) plot(twtab,caption='Most popular hashtags')
This graph shows the set of 5 of the most popular users. Just like the previous stats, while the graph shows the number of mentions through time, the table shows the total number of mentions.
# Graph mentions <- tw_timeseries(elements$mention,created_at = senate_tweets$created_at) plot(mentions)
# Creating table for users plot(tw_table(elements,'mention'),caption='Most popular users')
This is the graph of conversations between US senators. Colored by party (light blue are democrats, blue are republicans and orange is independent), the thickness of the edges (links) represent the number of times that one senator mentions the other. Notice that interestingly Democrats and Republicans tend to group around while Sen. Angus King (only independent in the graph) is right in between the two groups.
In this case I consider two senators connected iff there one of them appears at least 3 times in the other senator's status timeline.
tweets_components <- tw_extract(senate_tweets$text) groups <- data.frame( name = senators_profile$tw_screen_name, group = factor(senators$party), real_name = senators$Name, stringsAsFactors = FALSE) groups$name <- tolower(groups$name) senate_network <- tw_network( tolower(senate_tweets$screen_name), lapply(tweets_components$mention,unique),only.from = TRUE, group=groups, min.interact = 3) plot(senate_network, nodelabel='real_name')
The function tw_words
takes a character vector (of tweets for example) and extracts all the stopwords+symbols. And the plot
method for its output creates a wordcloud
plot(tw_words(senate_tweets$text), max.n.words = 50)
Using english and spanish names, the tw_gender
function matches the character argument (which can be a vector) with either a male or female name (or unidentified).
# Getting the names sen <- tolower(senators_profile$tw_name) sen <- gsub('\\bsen(ator|\\.)\\s+','',sen) sen <- gsub('\\s+.+','',sen) otab <- table(tw_gender(sen)) tab <- as.data.frame(otab) colnames(tab) <- c('Gender','count') if (requireNamespace("ggvis", quietly=TRUE)) { library(ggvis) tab %>% ggvis(~Gender, ~count) %>% layer_bars() } else { barplot(otab) }
Here we have an example clasifying senate tweets on #irandeal. Again, using ggvis
we plot an histogram to show the output
irandeal <- subset(senate_tweets, grepl('irandeal',text, ignore.case = TRUE)) irandeal$sentiment <- tw_sentiment(irandeal$text, normalize = TRUE) if (requireNamespace("ggvis", quietly=TRUE)) { library(ggvis) irandeal %>% ggvis(~sentiment) %>% layer_histograms() } else { hist(irandeal$sentiment) }
The function tw_leaflet
provides a nice wrapper for the function leaflet
of\
the package of the same name. Using D3js, we can visualize the number of tweets
grouped up geographically as the following example shows:
# Getting the data data(senate_tweets) # Aggregating until get only 3 big groups tw_leaflet(senate_tweets,~coordinates, nclusters=4,radii = ~sqrt(n)*3e5)
Note that in this case there are 14 tweets with the coordinates
column non-empty, leading to 4
different senators that have such information. Using the nclusters
option, the tw_leaflet
groups the data using the hclust
function of the stats package. So the user doesn't need to worry about aggregating data.
[caltechlogo]: r system.file(package='twitterreport')
/fig/Caltech_LOGO-Orange_RGB_10pc.png "Caltech"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.