Home

/

GitHub

/

In UBC-MDS/rtweetclean: Processor of data generated by the existing rtweetclean package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(rtweetclean)
library(knitr)
library(rtweet)

Getting a tweepy dataframe

This package is intended to be used to build upon the already existing rtweet package. In order to work with any of its functions a timeline dataframe returned by rtweet must be used. This requires the use of a twitter developer account/app to access the twitter api (we will be loading an example dataframe in case you do not have access to one). Note that my keys are stored locally on my machine and called using Sys.getenv(""), if you want to paste you keys you can do so by setting each of the objects to a string containing the required credentials (i.e. key \<- "#####"). If you do not have an app and are interested in creating one please follow the instructions here. Here are the instructions for how to retrieve this dataframe.

# If you have a twitter developer account/app you can copy/paste your access information in place of my Sys.getenv("") functions
appname <- Sys.getenv("appname")
key <- Sys.getenv("key")
secret <- Sys.getenv("secret")
access_token <- Sys.getenv("access_token")
access_secret <- Sys.getenv("access_secret")


twitter_token <- create_token(
  app = appname,
  consumer_key = key,
  consumer_secret = secret,
  access_token = access_token,
  access_secret = access_secret)

timeline_rtweet <- get_timeline("taylorswift13", n = 200)

head(timeline_rtweet)

Since everyone may not have twitter developer access we have provided an example dataframe that we have previously scraped (NOTE, please set your working director to rtweetclean/vignettes to load this dataframe)...

created_at  = c("2021-03-06 16:03:31",
                "2021-03-05 21:57:47",
                '2021-03-05 05:50:50',
                '2021-03-05 7:32:33')
text <- c("example tweet text 1 @user2 @user",
          "#example #tweet 2 ",
          "example tweet 3 https://t.co/G4ziCaPond",
          "example tweet 4")
retweet_count <- c(43, 12, 24, 29)
favorite_count <- c(85, 41, 65, 54)
timeline_rtweet_toy <- data.frame(text, retweet_count, favorite_count, created_at)
head(timeline_rtweet_toy)

clean_df()

The first of our functions, clean_df() is used to modify the timeline dataframe provided by rtweet by adding new columns by extraction information from tweets that tweepy does not provide on its own. The user is able to choose which columns are generated. Options for generateable columns include...

text_only: strips emojis, hashtags, and hyperlinks from the text column
word_count: counts the number of words contained in the text_only column
emojis: contains the extracted emojis from tweet text column
proportion_of_avg_retweets: a proportion value of how many retweets a tweet received compared to the average in the dataframe
proportion_of_avg_favorites: a proportion value of how many favorites a tweet received compared to the average in the dataframe

Different columns can be specified to be added by specifying their arguments as TRUE or FALSE when calling the clean_df() function. They are all set to TRUE by default.

clean_df \<- function(raw_tweets_df,

text_only = TRUE,

word_count = TRUE,

emojis = TRUE,

proportion_of_avg_retweets = TRUE,

proportion_of_avg_favorites = TRUE)

cleaned_timeline <- clean_df(timeline_rtweet_toy)
kable(tail(colnames(cleaned_timeline), 5))

We can see that new columns have been added to the dataframe. The contents of these columns can be seen below alongside the original text column provided by rtweet for comparison...

kable(head( subset(cleaned_timeline, select=c(text,text_only,word_count,emojis, prptn_rts_vs_avg, prptn_favorites_vs_avg)),5))

tweet_words()

tweet_words can be used with a dataframe that is returned by clean_df() and returns a list of the most frequently used words based on the text_only column generated by clean_df(). The top_n argument can be adjusted to return a specified number of the most frequently used words in the text_only column.

tweet_words(clean_rtweet_dataframe, top_n = 5)

tweet_words_example <- tweet_words(cleaned_timeline, top_n=30)
kable(tweet_words_example)

sentiment_total()

Takes the dataframe output of clean_df() as input. Takes the text_only column of tweets and returns a dataframe summarizing the number of tweeted words associated with particular emotional sentiments based on the 'crowd-sourced NRC Emotion Lexicon' . By default all sentiments listed in the lexicon will be presented regardless of if they are included in the dataset or not. If you wish to only see the sentiments that are present in the dataset then set drop_sentiment to FALSE.

sentiment_total(clean_rtweet_dataframe, drop_sentiment = FALSE)

sentiment_total_example <- sentiment_total(cleaned_timeline, drop_sentiment = TRUE)
sentiment_total_example

engagement_by_hour()

engegement_by_hour() is used to return a line plot generated via ggplot2 of the combined total of likes and retweets a user received depending on the hour of day their tweets were made at. It takes either a dataframe returned by clean_df() OR an unedited dataframe returned by the rtweet function get_timeline() as input and builds a graph based on the created_at column.

engagement_by_hour(clean_rtweet_dataframe)

engagement_by_hour(cleaned_timeline)

UBC-MDS/rtweetclean documentation built on April 28, 2021, 7:26 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UBC-MDS/rtweetclean
Processor of data generated by the existing rtweetclean package

In UBC-MDS/rtweetclean: Processor of data generated by the existing rtweetclean package

Getting a tweepy dataframe

clean_df()

tweet_words()

sentiment_total()

engagement_by_hour()

R Package Documentation

Browse R Packages

We want your feedback!

UBC-MDS/rtweetclean Processor of data generated by the existing rtweetclean package

In UBC-MDS/rtweetclean: Processor of data generated by the existing rtweetclean package

Getting a tweepy dataframe

clean_df()

tweet_words()

sentiment_total()

engagement_by_hour()

R Package Documentation

Browse R Packages

We want your feedback!

UBC-MDS/rtweetclean
Processor of data generated by the existing rtweetclean package