knitr::opts_chunk$set( # code chunk options echo = TRUE , eval = TRUE , warning = FALSE , message = FALSE , cached = FALSE , exercise = TRUE , exercise.completion = TRUE # fig , fig.align = "center" , fig.height = 4 , fig.width = 5.5 )
library(learnr) library(learn2scrape)
In this tutorial, you'll learn how to obtain Twitter data from Twitter's REST API. The Twitter REST API provides you with access to the tweets of all users or other user-level information.
We will use the following R packages:
# to access Twitter REST API library(rtweet) # data cleaning library(jsonlite) library(dplyr) library(tidyr) library(stringr) # visualizating library(ggplot2) library(maps) # text analyses library(quanteda) library(quanteda.textplots) options(quanteda_threads = 1L)
Make sure that you have your Twitter API credentials ready:
# ToDo: specify path to your secrets JSON file fp <- file.path(...) credentials <- fromJSON(fp) token <- do.call(create_token, credentials)
credentials <- fromJSON(system.file("extdata", "tw_credentials.json", package = "learn2scrape")) token <- do.call(create_token, credentials)
Note: If you don't, first go through the steps described in tutorial "103-twitter-setup" in the learn2scrape package: learnr::run_tutorial("103-twitter-setup", package = "learn2scrape")
Twitter allows the downloading of recent tweets based on keywords. However, the REST API will only return tweets that have been posted in the last 6 to 9 days (and in some cases not all of them). You can check the documentation about the options for string search here.
tweets <- search_tweets(q = "covid", n = 10) nrow(tweets) head(tweets, 3)
By default, the Twitter API allows you to download up to 18.000 Tweets.
To return more than 18.000 tweets with a single search, you can set retryonratelimit = TRUE
.
Now, we can again search for tweets with a geocode. Hopefully, when also including past tweets, we will receive more results, so we will try to do this with Tweets 50km around GESIS. (You can lookup places' coordinates like described here.)
# search for tweets created (by users) near GESIS gesis <- search_tweets(geocode = "50.94268842250975, 6.952277084819257,50km", n = 100) # convert to latitude and longitude gesis <- rtweet::lat_lng(gesis) # plot maps::map("world", regions = "Germany") with(gesis, points(lng, lat, col = rgb(0, .3, .7, .75)))
Instead of just looking for key words or geo-coded tweets, you can also analyze specific users.
You can download up to 3,200 most recent tweets from a Twitter account using get_timeline()
.
rtweet
also has some in-built functions for plotting the results, namely ts_plot()
.
So do you see any changes to the German Social Democrat's tweeting behavior over the past months?
Hint: if you want to query tweets for more than one user, use get_timelines()
.
tweets <- get_timeline("spdbt", n = 100) # plot ts_plot(tweets, color = "gray", lwd = .5) + geom_smooth() + scale_x_datetime(date_labels = "%b %Y") + theme_bw()
get_timeline()
returns a lot of information associated with tweets.
Most importantly:
Hence, if you want to get users' self-descriptions, you can also use the get_timeline()
function.
But actually, if you really only want to get the description of the users, you can also simply use lookup_users()
.
wh <- c("JoeBiden", "POTUS", "VP", "FLOTUS") users <- lookup_users(user = wh) users$description
If you are interested in networks, you can also download friends and followers. Friends are the people the account you query follows, while followers are the people who follow that account.
For example, let us look at the friends and followers of the account of my home department: the Political Science Department of the University of Zurich. Of course, you can replace that with your department's account:
followers <- get_followers("IPZ_ch") friends <- get_friends("IPZ_ch")
We can also compare these lists --- whom does my department follow, who follows my department?
follower_network <- friends %>% select(user_id) %>% mutate(ipz_follows = TRUE) %>% full_join( mutate(followers, follows_ipz = TRUE), by = 'user_id' ) %>% mutate_all(replace_na, FALSE) # tabulate with(follower_network, table(follows_ipz, ipz_follows))
What are the most common words that friends of my department's account use to describe themselves on Twitter?
We do a bit more text analysis now by visualizing the results with a word cloud using the quanteda
R package.
This allows us to remove things like URLs or non-meaningful words before we visualize the user descriptions.
# extract profile descriptions users <- lookup_users(user = friends$user_id) # create corpus of user descriptions corp <- users %>% filter(!is.na(description), trimws(description) != "") %>% select(user_id, screen_name, name, description, account_created_at) %>% corpus(text_field = "description", docid_field = "user_id") # convert to document-term matrix (bag-of-words) bow <- corp %>% tokens( remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE ) %>% tokens_remove(c(stopwords("en"), stopwords("de"))) %>% dfm() topfeatures(bow, n = 15) # create a wordcloud textplot_wordcloud( bow, rotation = 0, min_size = 1, max_size = 5, max_words = 100 )
The REST API offers also a long list of other endpoints that could be of use in your projects.
You can use search_users()
to search for users related to specific keywords based on their self-description. For example, you might want to look into users that are interested in methods or politics or or or:
usrs <- search_users(q = "data journalism", n = 10) users$screen_name
If you know the ID of the tweets, you can download it directly from the API using lookup_statuses()
.
This is useful because tweets cannot be redistributed as part of the replication materials of a published paper, but the list of tweet IDs can be shared:
# Downloading tweets when you know the ID status <- lookup_statuses(statuses = c("896523232098078720")) status$text
"Lists" of Twitter users, compiled by other users, are also accessible through the API. There are many lists of politicians, leaders etc. that you can use.
You can search by 'slug', that is name, if you specify the list owner, otherwise you have to find the list_id
. Try to find information about this list of world leaders.
# download user information from a list world_leaders <- lists_members( list_id = "1044685725369815040", owner_user = "@TwitterGov" ) world_leaders
This is also useful if e.g. you're interested in compiling lists of journalists, because media outlets offer these lists in their profiles.
The opposite approach is to search for lists that contain a certain user with lists_memberships()
. E.g., you could look for all lists that contain Joe Biden.
biden_lists <- lists_memberships(user = "JoeBiden") biden_lists
List of users who retweeted a particular tweet --- unfortunately, it's limited to only 100 most recent retweets.
rts <- get_retweets(status_id = "896523232098078720")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.