Prior to streaming, make sure to install and load rtweet.
This vignette assumes users have already setup app access tokens (see: the "auth" vignette, vignette("auth", package = "rtweet")
).
## Load rtweet library(rtweet) client_as("my_app")
rtweet makes it possible to capture live streams of Twitter data[^1].
[^1]: Till November 2022 it was possible with API v1.1, currently this is no longer possible and uses API v2.
There are two ways of having a stream:
A stream collecting data from a set of rules, which can be collected via filtered_stream()
.
A stream of a 1% of tweets published, which can be collected via sample_stream()
.
In either case we need to choose how long should the streaming connection hold, and in which file it should be saved to.
## Stream time in seconds so for one minute set timeout = 60 ## For larger chunks of time, I recommend multiplying 60 by the number ## of desired minutes. This method scales up to hours as well ## (x * 60 = x mins, x * 60 * 60 = x hours) ## Stream for 5 seconds streamtime <- 5 ## Filename to save json data (backup) filename <- "rstats.json"
The filtered stream collects tweets for all rules that are currently active, not just one rule or query.
Streaming rules in rtweet need a value and a tag. The value is the query to be performed, and the tag is the name to identify tweets that match a query. You can use multiple words and hashtags as value, please read the official documentation. Multiple rules can match to a single tweet.
## Stream rules used to filter tweets new_rule <- stream_add_rule(list(value = "#rstats", tag = "rstats"))
To know current rules you can use stream_add_rule()
to know if any rule is currently active:
rules <- stream_add_rule(NULL) rules #> result_count sent #> 1 1 2023-03-19 22:04:29 rules(rules) #> id value tag #> 1 1637575790693842952 #rstats rstats
With the help of rules()
the id, value and tag of each rule is provided.
To remove rules use stream_rm_rule()
# Not evaluated now stream_rm_rule(ids(new_rule))
Note, if the rules are not used for some time, Twitter warns you that they will be removed.
But given that filtered_stream()
collects tweets for all rules, it is advisable to keep the rules list short and clean.
Once these parameters are specified, initiate the stream. Note: Barring any disconnection or disruption of the API, streaming will occupy your current instance of R until the specified time has elapsed. It is possible to start a new instance or R ---streaming itself usually isn't very memory intensive--- but operations may drag a bit during the parsing process which takes place immediately after streaming ends.
## Stream election tweets stream_rstats <- filtered_stream(timeout = streamtime, file = filename, parse = FALSE) #> Warning: No matching tweets with streaming rules were found in the time provided.
If no tweet matching the rules is detected a warning will be issued.
Parsing larger streams can take quite a bit of time (in addition to time spent streaming) due to a somewhat time-consuming simplifying process used to convert a json file into an R object.
Don't forget to clean the streaming rules:
stream_rm_rule(ids(new_rule)) #> sent deleted not_deleted #> 1 2023-03-19 22:04:51 1 0
The sample_stream()
function doesn't need rules or anything.
stream_random <- sample_stream(timeout = streamtime, file = filename, parse = FALSE) #> Found 316 records... Imported 316 records. Simplifying... length(stream_random) #> [1] 316
Users may want to stream tweets into json files upfront and parse those files later on.
To do this, simply add parse = FALSE
and make sure you provide a path (file name) to a location you can find later.
You can also use append = TRUE
to continue recording a stream into an already existing file.
Currently parsing the streaming data file with parse_stream()
is not functional.
However, you can read it back in with jsonlite::stream_in(file)
.
The parsed object should be the same whether a user parses up-front or from a json file in a later session.
Currently the returned object is a raw conversion of the feed into a nested list depending on the fields and extensions requested.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.