The twittrcademic
package provides a collection of utility functions
supporting the retrieval of Tweet data via Twitter v2 API endpoints in
the Twitter Academic Research product
track.
This package has been set up as a personal library to collect Tweet data for academic research. The package does not provide any functions to analyze the retrieved data.
The API endpoints in the Twitter Academic Research product track offer access to the full Tweet archive. These endpoints rely on the Twitter API v2 with a significantly different Tweet object model compared to the v1.1 API. In addition to structural differences in the JSON responses, the v2 endpoints require that most objects and attributes — in for example a Tweet object — have to be explicitly specified in the API request in order to be included in the response. (By default the v2 search endpoint JSON contains only Tweet ID and text.)
In order to use the functions in this package, API keys specifically for the Academic Research product track are required, standard API access keys will not work.
Install the development version of twittrcademic
from
GitHub with:
# install.packages("devtools")
devtools::install_github("sdaume/twittrcademic")
The package functions can be used to execute a single Tweet search API call against the /2/tweets/search/all endpoint or execute long-running searches for large result sets and store the results in multiple suitably sized batches.
This will return a JSON response of at most 500 Tweets, which could be
processed directly with tools like jsonlite
. The example below would
return the 100 most recent Tweets containing the keyword
openscience and posted on the 12. June 2020 or earlier.
library(twittrcademic)
bearer <- oauth_twitter_token(consumerKey = "YOUR_ACADEMIC_PRODUCT_API_KEY",
consumerSecret = "YOUR_ACADEMIC_PRODUCT_API_SECRET")
json_response <- search_tweets(queryString = "openscience",
maxResult = 100,
toDate = "2020-06-12",
twitterBearerToken = bearer)
The following example would collect all Tweets posted in the year
2020 that contain the term ‘planetary boundaries’. This will
run until all results are retrieved. The results will be summarised into
batches of files that contain approximately 20000 Tweets; files
are stored in the working directory and all start with
‘query_label’ (for example
query_label_20200101_20201231_1_20453.json
); in addition to the base
label the file name indicates the date range (implicit or explicit) of
the query, a numeric index for the batch and the number of Tweet results
in the given batch.
library(twittrcademic)
bearer <- oauth_twitter_token(consumerKey = "YOUR_ACADEMIC_PRODUCT_API_KEY",
consumerSecret = "YOUR_ACADEMIC_PRODUCT_API_SECRET")
search_and_store_tweets(queryString = "planetary boundaries",
fromDate = "2020-01-01",
toDate = "2020-12-31",
maxBatchSize = 20000,
batchBaseLabel = "query_label",
twitterBearerToken = bearer)
The package is shared under an MIT License.
This package has been developed to support research at the Stockholm Resilience Centre; this research has benefited from funding by the Swedish Research Council for Sustainable Development (Formas).
This package has been developed as a reusable tool for the author(s) own research and comes with no guarantee for the correctness or completeness of the retrieved data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.