knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In v0.2 of the package, we include functionality to convert JSON files to various data frame formats. In order to use these features, we recommend the following workflow.
First, you should build your query using the build_query
function.
require(academictwitteR) require(tibble) my_query <- build_query(c("#ichbinhanna", "#ichwarhanna"), place = "Berlin") my_query
Then, use the get_all_tweets
to collect data. Make sure to specify data_path
and set bind_tweets
to FALSE.
get_all_tweets( query = my_query, start_tweets = "2021-06-01T00:00:00Z", end_tweets = "2021-06-20T00:00:00Z", n = Inf, data_path = "tweetdata", bind_tweets = FALSE )
The first format is the so-called "vanilla" format. This vanilla format is the direct output from jsonlite::read_json
. It can display columns such as text
just fine. But some columns such as retweet_count
are nested in list-columns.
In order to extract user information, it is additionally necessary to set user = TRUE
. Please also note that the data frame returned in this format is not a tibble. As such, we first need to convert it to a tibble.
bind_tweets(data_path = "tweetdata") %>% as_tibble
bind_tweets(system.file("extdata", "tweetdata", package = "academictwitteR")) %>% as_tibble
The second format is the "raw" format. It is a list of data frames containing all of the data extracted in the API call. Please note that not all data frames are in Boyce-Codd 3rd Normal form, i.e. some columns are still list-column.
bind_tweets(data_path = "tweetdata", output_format = "raw") %>% names
bind_tweets(system.file("extdata", "tweetdata", package = "academictwitteR"), output_format = "raw") %>% names
The third format is the "tidy" format. It is an "opinionated" format, which we believe to contain all essential columns for social media research. By default, it is a tibble.
bind_tweets(data_path = "tweetdata", output_format = "tidy")
bind_tweets(system.file("extdata", "tweetdata", package = "academictwitteR"), output_format = "tidy")
It has the following features / caveats:
tweet_id
, author_id
and sourcetweet_id
respectively.text
field of a retweet is truncated. However, the full-text original tweet is located in sourcetweet_text
.sourcetweet_text
. If you need that data, please follow the clue using the conversation_id
.text
by Twitter are not available in the tidy format, e.g. list of hashtags, cashtags, urls, entities, context annotations etc. If you need those columns, please consider using the "raw" format above.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.