Scraping Twitter

My goals in scraping my twitter feed are:

The plan is for each type of archiving to have a separate Rscript that runs each one, so that each can be controlled by a separate cron job and they not step on each others toes. In addition, the actual archive files will be separate, so that we don't have to worry about access control, writing at the same time, etc.

Limits

The twitter 1.1 api states that an application can make 15 requests every 15 minutes. So in archiving my own feed, I don't imagine making requests more than once an hour. Favorites can probably be archived morning and night (every 12 hours), and hashtags can probably also be done once an hour (or less) depending on the hashtag.

Archiving my own feed

I want to maintain an archive of my own feed from here on out. But not just my own tweets, but my actual user stream.

archiveLoc <- "C:/Users/rmflight/Documents/coding_projects/socialmediaArchive/twitter"
library(twitteR)
load("keys/twitteRCredentials.RData")

registerTwitterOAuth(twitterCredentials)
homeData <- file.path(archiveLoc, "homeData.RData")
homeText <- file.path(archiveLoc, "homeText.txt")

load(homeData)

allID <- as.numeric(sapply(currTweets, function(x){x$id}))
lastID <- as.character(max(allID))

newTweets <- homeTimeline(n=100, sinceID=lastID)
currTweets <- c(currTweets, newTweets)
save(currTweets, file=homeData)

textTweets <- lapply(newTweets, textTweet)
textTweets <- paste(textTweets, sep="", collapse="\n")
cat(textTweets, file=homeText, append=TRUE)


rmflight/socialmediascraping documentation built on May 27, 2019, 9:32 a.m.