Prior to streaming, make sure to install and load rtweet.
This vignette assumes users have already setup user access tokens (see: the "auth" vignette, vignette("auth", package = "rtweet")
).
## Load rtweet library(rtweet)
In addition to accessing Twitter's REST API (e.g., search_tweets
, get_timeline
), rtweet makes it possible to capture live streams of Twitter data using the stream_tweets()
function.
By default, stream_tweets
will stream for 30 seconds and return a random sample of tweets.
To modify the default settings, stream_tweets
accepts several parameters, including q
(query used to filter tweets), timeout
(duration or time of stream), and file_name
(path name for saving raw json data).
## Stream keywords used to filter tweets q <- "#rstats" ## Stream time in seconds so for one minute set timeout = 60 ## For larger chunks of time, I recommend multiplying 60 by the number ## of desired minutes. This method scales up to hours as well ## (x * 60 = x mins, x * 60 * 60 = x hours) ## Stream for 30 seconds streamtime <- 30 ## Filename to save json data (backup) filename <- "rstats.json"
Once these parameters are specified, initiate the stream. Note: Barring any disconnection or disruption of the API, streaming will occupy your current instance of R until the specified time has elapsed. It is possible to start a new instance or R ---streaming itself usually isn't very memory intensive--- but operations may drag a bit during the parsing process which takes place immediately after streaming ends.
## Stream election tweets stream_rstats <- stream_tweets(q = q, timeout = streamtime, file_name = filename) #> Streaming tweets: 7 tweets written / 65.54 kB / 888.99 kB/s / 00:00:00
Parsing larger streams can take quite a bit of time (in addition to time spent streaming) due to a somewhat time-consuming simplifying process used to convert a json file into an R object. It is recommended to save the streaming in a file.
Given a lengthy parsing process, users may want to stream tweets into json files upfront and parse those files later on.
To do this, simply add parse = FALSE
and make sure you provide a path (file name) to a location you can find later.
Regardless of whether you decide to setup a system for streaming data, the process of streaming a file to disk and parsing it at a later point in time is the same, as illustrated in the example below.
## No upfront-parse save as json file instead method # By default results are appended to the file. stream_tweets( q = q, parse = FALSE, timeout = streamtime, file_name = filename ) #> Streaming tweets: 8 tweets written / 65.54 kB / 2.19 MB/s / 00:00:00 Streaming tweets: 14 tweets written / 131.07 kB / 16.21 kB/s / #> 00:00:08 ## Parse from json file stream_rstats <- parse_stream(filename)
The parsed object should be the same whether a user parses up-front or from a json file in a later session. The returned object should be a data frame consisting of tweets data.
## Preview tweets data head(stream_rstats) #> created_at id id_str #> 1 2022-05-21 11:46:55 1.527949e+18 1527949049319530496 #> 2 2022-05-21 11:47:01 1.527949e+18 1527949070827933704 #> text #> 1 #Infographic: Types of \nhttps://t.co/Gdq7B6vFwY\n\n#MachineLearning\n#DataScience #programming #DataScientists… https://t.co/KTlhlz9euO #> 2 RT @KirkDBorne: #DataScience Theories, Models, #Algorithms, and Analytics: https://t.co/w704rhiLS8 (FREE PDF Download, 462 pages)\n—————\n#Bi… #> source truncated in_reply_to_status_id #> 1 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> TRUE NA #> 2 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> FALSE NA #> in_reply_to_status_id_str in_reply_to_user_id in_reply_to_user_id_str in_reply_to_screen_name geo coordinates #> 1 NA NA NA NA NA NA, NA, NA #> 2 NA NA NA NA NA NA, NA, NA #> place contributors is_quote_status quote_count reply_count retweet_count favorite_count #> 1 NA, NA, NA, NA, NA, NA, NA NA FALSE 0 0 0 0 #> 2 NA, NA, NA, NA, NA, NA, NA NA FALSE 0 0 0 0 #> entities #> 1 Infographic, MachineLearning, DataScience, programming, DataScientists, 0, 12, 49, 65, 66, 78, 79, 91, 92, 107, https://t.co/Gdq7B6vFwY, https://t.co/KTlhlz9euO, https://bit.ly/3oZvkPU, https://twitter.com/i/web/status/1527949049319530496, bit.ly/3oZvkPU, twitter.com/i/web/status/1…, 24, 109, 47, 132, NA, NA, NA, NA, NA, NA, NA, NA, NA #> 2 DataScience, Algorithms, 16, 28, 47, 58, https://t.co/w704rhiLS8, http://bit.ly/3vZqiYr, bit.ly/3vZqiYr, 75, 98, NA, KirkDBorne, Kirk Borne, 534563976, 534563976, 3, 14, NA, NA #> favorited retweeted possibly_sensitive filter_level lang timestamp_ms #> 1 FALSE FALSE FALSE low en 1653126415935 #> 2 FALSE FALSE FALSE low en 1653126421063 #> retweeted_status #> 1 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA #> 2 Sat May 21 03:15:01 +0000 2022, 1527850420793663488, 1527850420793663488, #DataScience Theories, Models, #Algorithms, and Analytics: https://t.co/w704rhiLS8 (FREE PDF Download, 462 pages)\n—… https://t.co/VR6LqqMuQu, 0, 140, <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>, TRUE, NA, NA, NA, NA, NA, 534563976, 534563976, Kirk Borne, KirkDBorne, Maryland, USA, http://www.linkedin.com/in/kirkdborne, Data Scientist, Chief ScienceOfficer @DataPrime_ai. Global Speaker. Founder @LeadershipData. Top #BigData #DataScience #AI #IoT #ML Influencer. PhD Astrophysics, none, FALSE, TRUE, 314491, 8615, 8768, 215023, 153655, Fri Mar 23 16:35:17 +0000 2012, NA, NA, FALSE, NA, FALSE, FALSE, C0DEED, http://abs.twimg.com/images/themes/theme1/bg.png, https://abs.twimg.com/images/themes/theme1/bg.png, TRUE, 0084B4, FFFFFF, DDEEF6, 333333, TRUE, http://pbs.twimg.com/profile_images/1112733580948635648/s-8d1avb_normal.jpg, https://pbs.twimg.com/profile_images/1112733580948635648/s-8d1avb_normal.jpg, https://pbs.twimg.com/profile_banners/534563976/1651617018, FALSE, FALSE, NA, NA, NA, NA, NA, NA, NA, FALSE, #DataScience Theories, Models, #Algorithms, and Analytics: https://t.co/w704rhiLS8 (FREE PDF Download, 462 pages)\n—————\n#BigData #AI #MachineLearning #DataScientists #Mathematics #Statistics #Rstats #Coding #NetworkScience #NeuralNetworks #100DaysOfCode https://t.co/mncH2ANBoc, 0, 253, DataScience, Algorithms, BigData, AI, MachineLearning, DataScientists, Mathematics, Statistics, Rstats, Coding, NetworkScience, NeuralNetworks, 100DaysOfCode, 0, 12, 31, 42, 120, 128, 129, 132, 133, 149, 150, 165, 166, 178, 179, 190, 191, 198, 199, 206, 207, 222, 223, 238, 239, 253, https://t.co/w704rhiLS8, http://bit.ly/3vZqiYr, bit.ly/3vZqiYr, 59, 82, 1527850418260348928, 1527850418260348932, 254, 277, http://pbs.twimg.com/media/FTQD3EpXEAQSWRu.jpg, https://pbs.twimg.com/media/FTQD3EpXEAQSWRu.jpg, https://t.co/mncH2ANBoc, pic.twitter.com/mncH2ANBoc, https://twitter.com/KirkDBorne/status/1527850420793663488/photo/1, photo, 1074, 1390, fit, 927, 1200, fit, 150, 150, crop, 525, 680, fit, 1527850418260348928, 1527850418260348932, 254, 277, http://pbs.twimg.com/media/FTQD3EpXEAQSWRu.jpg, https://pbs.twimg.com/media/FTQD3EpXEAQSWRu.jpg, https://t.co/mncH2ANBoc, pic.twitter.com/mncH2ANBoc, https://twitter.com/KirkDBorne/status/1527850420793663488/photo/1, photo, 1074, 1390, fit, 927, 1200, fit, 150, 150, crop, 525, 680, fit, 0, 1, 25, 43, DataScience, Algorithms, 0, 12, 31, 42, https://t.co/w704rhiLS8, https://t.co/VR6LqqMuQu, http://bit.ly/3vZqiYr, https://twitter.com/i/web/status/1527850420793663488, bit.ly/3vZqiYr, twitter.com/i/web/status/1…, 59, 82, 117, 140, FALSE, FALSE, FALSE, low, en, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA #> quoted_status_id quoted_status_id_str #> 1 NA <NA> #> 2 NA <NA> #> quoted_status #> 1 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA #> 2 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA #> quoted_status_permalink display_text_width full_text favorited_by scopes display_text_range metadata query withheld_scope #> 1 NA, NA, NA 132 NA NA NA NA NA NA NA #> 2 NA, NA, NA 140 NA NA NA NA NA NA NA #> withheld_copyright withheld_in_countries possibly_sensitive_appealable #> 1 NA NA NA #> 2 NA NA NA #> [ reached 'max' / getOption("max.print") -- omitted 4 rows ]
The returned object should also include a data frame of users data, which Twitter's stream API automatically returns along with tweets data.
To access users data, use the users_data
function.
## Preview users data head(users_data(stream_rstats)) #> id id_str name screen_name location url #> 1 2.553201e+09 2553201139 Hameed Kahn hameeduom41 Islamabad https://bit.ly/3oZvkPU #> 2 3.622639e+07 36226389 rosangela maria MMAPULLA Campo Grande, MS, Brazil <NA> #> 3 9.716201e+17 971620109197312000 Serverless Fan ServerlessFan Sri Lanka <NA> #> 4 1.437729e+18 1437728609909706753 Developer Bot 🤖 GoaiDev India https://thebotman.co #> description #> 1 Using suitable technologies, we create secure, scalable software solutions that integrate with new and legacy systems to drive business growth & innovation. #> 2 meu perfil @MMAPULLA, para quem não sabe, se deve a um apelido carinhoso que recebi em minha última viagem a Botswana. trad. aquela que traz a chuva #> 3 #Serverless #Computing #> 4 I do retweet #javascript.\nCreated by @rahul05ranjan.\nDM for any query!! #> protected verified followers_count friends_count listed_count favourites_count statuses_count created_at #> 1 FALSE FALSE 526 1817 0 3296 5141 Sat Jun 07 19:40:16 +0000 2014 #> 2 FALSE FALSE 138 530 0 37369 8574 Wed Apr 29 00:24:15 +0000 2009 #> 3 FALSE FALSE 4012 6 145 7 533557 Thu Mar 08 05:34:20 +0000 2018 #> 4 FALSE FALSE 1803 14 32 246879 508947 Tue Sep 14 10:43:25 +0000 2021 #> profile_image_url_https #> 1 https://pbs.twimg.com/profile_images/1526660385306443776/VljjOVKm_normal.jpg #> 2 https://pbs.twimg.com/profile_images/1319402128968945667/BMkHmXJp_normal.jpg #> 3 https://pbs.twimg.com/profile_images/973611685822058497/yRRo9D52_normal.jpg #> 4 https://pbs.twimg.com/profile_images/1459402333201199104/skHuHVT9_normal.jpg #> profile_banner_url default_profile default_profile_image withheld_in_countries #> 1 https://pbs.twimg.com/profile_banners/2553201139/1652988231 TRUE FALSE NULL #> 2 https://pbs.twimg.com/profile_banners/36226389/1456327757 FALSE FALSE NULL #> 3 https://pbs.twimg.com/profile_banners/971620109197312000/1525426121 FALSE FALSE NULL #> 4 <NA> TRUE FALSE NULL #> derived withheld_scope entitites #> 1 <NA> NA NULL #> 2 <NA> NA NULL #> 3 <NA> NA NULL #> 4 <NA> NA NULL #> [ reached 'max' / getOption("max.print") -- omitted 2 rows ]
Once parsed, ts_plot()
provides a quick visual of the frequency of tweets.
By default, ts_plot()
will try to aggregate time by the day.
Because I know the stream only lasted 30 minutes, I've opted to aggregate tweets by the second.
It'd also be possible to aggregate by the minute, i.e., by = "mins"
, or by some value of seconds, e.g.,by = "15 secs"
.
## Plot time series of all tweets aggregated by second ts_plot(stream_rstats, by = "secs")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.