README.md
In gvegayon/twitterreport: Out-of-the-Box Analysis and Reporting Tools for Twitter

twitterreport

Out-of-the-box analysis and reporting tools for twitter

While there are some (very neat) R packages focused on twitter (namely twitteR and stramR), twitterreport is centered on providing analysis and reporting tools for twitter data. The package's current version features:

Access to twitter API
Extracting mentions/hashtags/urls from text (tweets)
Gender tagging by matching user names with gender datasets included in the package (es and en)
Creating (mentions) networks and visualizing them using D3js
Sentiment analysis (basic, but useful) using lexicons included in the package (again, es and en)
Creating time series charts of hashtags/users/etc. and visualizing them using D3js
Create wordclouds (after removing stop words and processing the text)
Map visualization using the leaflet package
Topics identification through the Jaccard coeff (words similarity)

You can take a look at a live example at http://www.its.caltech.edu/~gvegayon/twitter/report_example.html, and at the source code of that example at https://github.com/gvegayon/twitterreport/blob/master/vignettes/report_example.Rmd

Some of the functions here were firstly developed in the project nodoschile.cl (no longer running). You can visit the project's testimonial website http://nodos.modularity.cl and the website (part of nodoschile) that motivated twitterreports at http://modularity.cl/presidenciales.

While the package is still in development, you can always use devtools to install the most recent version.

devtools::install_git('gvegayon/twitterreport')

# Firts, load the package!
library(twitterreport)

# List of twitter accounts
users <- c('MarsRovers', 'senatormenendez', 'sciencemagazine')

# Getting the twitts (first gen the token)
key <- tw_gen_token('myapp','key', 'secret')
tweets <- lapply(users, tw_api_get_statuses_user_timeline, twitter_token=key)

# Processing the data (and taking a look)
tweets <- do.call(rbind, tweets)
head(tweets)

# Loading data
data("senators")
data("senators_profile")
data("senate_tweets")

tweets_components <- tw_extract(senate_tweets$text)
groups <- data.frame(
  name      = senators_profile$tw_screen_name,
  group     = factor(senators$party),
  real_name = senators$Name,
  stringsAsFactors = FALSE)
groups$name <- tolower(groups$name)

senate_network <- tw_network(
  tolower(senate_tweets$screen_name),
  lapply(tweets_components$mention,unique),only.from = TRUE,
  group=groups, min.interact = 3)

plot(senate_network, nodelabel='real_name')

In the following examples we will use data on US senators extracted from twitter using the REST API (you can find it in the package)

The function tw_words takes a character vector (of tweets for example) and extracts all the stopwords+symbols. And the plot method for its output creates a wordcloud

data(senate_tweets)
tab <- tw_words(senate_tweets$text)

# What did it do?
senate_tweets$text[1:2];tab[1:2]

## [1] "“I am saddened by the news that four Marines lost their lives today in the service of our country.” #Chattanooga"         
## [2] ".@SenAlexander statement on today’s “tragic and senseless” murder of four Marines in #Chattanooga: http://t.co/H9zWdJPbiE"

## [[1]]
##  [1] "saddened"    "news"        "four"        "marines"     "lost"       
##  [6] "lives"       "today"       "service"     "country"     "chattanooga"
## 
## [[2]]
## [1] "senalexander" "statement"    "todays"       "tragic"      
## [5] "senseless"    "murder"       "four"         "marines"     
## [9] "chattanooga"

# Plot
set.seed(123) # (so the wordcloud looks the same always)
plot(tab, max.n.words = 40)

Using english and spanish names, the tw_gender function matches the character argument (which can be a vector) with either a male or female name (or unidentified).

data(senators_profile)

# Getting the names
sen <- tolower(senators_profile$tw_name)
sen <- gsub('\\bsen(ator|\\.)\\s+','',sen)
sen <- gsub('\\s+.+','',sen)

tab <- table(tw_gender(sen))
barplot(tab)

Here we have an example clasifying senate tweets on the #irandeal.

irandeal <- subset(senate_tweets, grepl('irandeal',text, ignore.case = TRUE))
irandeal$sentiment <- tw_sentiment(irandeal$text, normalize = TRUE)

hist(irandeal$sentiment, col = 'lightblue', 
     xlab ='Valence (strength of sentiment)')

The function tw_leaflet provides a nice wrapper for the function leaflet of the package of the same name. Using D3js, we can visualize the number of tweets grouped up geographically as the following example shows:

tw_leaflet(senate_tweets,~coordinates, nclusters=4,radii = ~sqrt(n)*3e5)

Note that in this case there are 14 tweets with the coordinates column non-empty, leading to 4 different senators that have such information. Using the nclusters option, the tw_leaflet groups the data using the hclust function of the stats package. So the user doesn't need to worry about aggregating data.

An interesting issue to review is how are words related to each other. Using the Jaccard coefficient we are able to estimate a measure of distance between two words. The jaccard_coef function implements such algorithm, and it allows us to get a better understanding of topics, as the following example

# Computing the jaccard coefficient
jaccard <- jaccard_coef(senate_tweets$text,max.size = 1000)

# See what words are related with abortion
words_closeness('veterans',jaccard,.025)

##        word         coef
## 1  veterans 318.00000000
## 2        va   0.08982036
## 3      care   0.08510638
## 4     honor   0.04389313
## 5    access   0.04201681
## 6   deserve   0.04176334
## 7    health   0.04022989
## 8  benefits   0.03827751
## 9    mental   0.03733333
## 10  honored   0.03505155
## 11     home   0.03440860
## 12  service   0.03266788
## 13     july   0.03108808
## 14   combat   0.02964960
## 15 services   0.02857143
## 16   choice   0.02549575
## 17    thank   0.02529960

We can also do this using the output from tw_extract, this is, by passing a list of character vectors (this is much fasters)

hashtags <- tw_extract(senate_tweets$text, obj = 'hashtag')$hashtag

# Again, but using a list
jaccard <- jaccard_coef(hashtags,max.size = 15000)
jaccard

## Jaccard index Matrix (Sparse) of 3283x3283 elements
## Contains the following words (access via $freq):
##          wrd   n
## 1   irandeal 202
## 2       iran 179
## 3     scotus 141
## 4        tpa 132
## 5      netde 119
## 6 mepolitics 117

# See what words are related with abortion
words_closeness('veterans',jaccard,.025)

##          word        coef
## 1    veterans 78.00000000
## 2 honorflight  0.06382979
## 3          va  0.05154639
## 4  miasalutes  0.05000000
## 5     4profit  0.04166667
## 6   choiceact  0.03658537
## 7 40mileissue  0.02564103
## 8        hepc  0.02531646

George G. Vega Yon

g vegayon at caltech

gvegayon/twitterreport documentation built on May 17, 2019, 9:30 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gvegayon/twitterreport
Out-of-the-Box Analysis and Reporting Tools for Twitter

README.md
In gvegayon/twitterreport: Out-of-the-Box Analysis and Reporting Tools for Twitter

twitterreport

About

Installation

Examples

Getting tweets from a set of users

Creating a (fancy) network of mentions

Creating a wordcloud

Identifying individuals gender

Sentiment analysis

A map using leaflet

Words closeness

Author

R Package Documentation

Browse R Packages

We want your feedback!

gvegayon/twitterreport Out-of-the-Box Analysis and Reporting Tools for Twitter

README.md In gvegayon/twitterreport: Out-of-the-Box Analysis and Reporting Tools for Twitter

twitterreport

About

Installation

Examples

Getting tweets from a set of users

Creating a (fancy) network of mentions

Creating a wordcloud

Identifying individuals gender

Sentiment analysis

A map using leaflet

Words closeness

Author

R Package Documentation

Browse R Packages

We want your feedback!

gvegayon/twitterreport
Out-of-the-Box Analysis and Reporting Tools for Twitter

README.md
In gvegayon/twitterreport: Out-of-the-Box Analysis and Reporting Tools for Twitter