Description Usage Arguments Details Value Scroll sleep WARNING
View source: R/screen_scrape_tweets.R
Given a twitter account screen name or ID, and start and end dates,
function screen-scrapes IDs of historical tweets in time range and returns them
in a data frame.
Optionally, the scraped IDs can additionally be written to disk (if write.out = TRUE
).
1 2 3 4 5 6 | scrape_tweet_ids(tw.account, remdr, since.date, until.date,
date.interval = "month", max.tweets.pi = 10000, write.out = TRUE,
write.out.path,
write.out.name = sprintf("tw_user_%s_tweet_ids_%s.json", tw.account,
paste0(since.date, "_to_", until.date)), sleep = 0.5,
.scroll.sleep = 0.75, verbose = TRUE)
|
tw.account |
a scalar character vector, specifying a Twitter screen name or account ID |
remdr |
an active RSelenium |
since.date |
create date of oldest tweets to get Only accepts dates in format '%Y-%m-%d' (Year-month-day: 'YYYY-mm-dd') |
until.date |
create date of most recent (youngest) tweets to get Only accepts dates in format '%Y-%m-%d' (Year-month-day: 'YYYY-mm-dd') |
date.interval |
date interval passed to 'by' argument of |
max.tweets.pi |
maximum nuber of tweets per intevall to load. Defaults to 10'000. (See Dtails section) |
write.out |
logical. write out tweet IDs as JSON to disk?
If |
write.out.path |
Write out path (directory where to write scraped IDs file)
Will be ignored if |
write.out.name |
JSON file name.
Defaults to 'tw_user_< |
sleep |
Seconds to pause between date ranges when iterating over date intervals defined by
|
.scroll.sleep |
Seconds to pause between scrolls when scrolling for more tweets. Defautls to .75 seconds. (See section 'scroll sleep' for details.) |
verbose |
logical. Print out status messages? |
Note that the maximum number of tweets loaded per date interval (max.tweets.pi
) needs to be adapted to the date interval.
Per scroll, 20 new tweets are loaded.
By default, there comes a pause of .75 seconds between scrolls.
This means that at maximum, waiting for 10'000 tweets to load takes ((10000/20) * .75)/60 = 6.25 minutes.
A tibble
data frame.
The data frame is empty if an error occurs or no tweet IDs were scraped in the given time range.
Otherwise it has columns 'account' (<chr>), 'since' (<date>), 'until' (<date>) and 'tweet_id' (<chr>),
and one row is one tweet.
Argument .scroll.sleep
determines how much the Twitter timeline has to fully load.
WARNING: Setting low values (<.75 seconds) endangers not getting all tweet IDs,
as the scraping process can be aborted prematurely due to too little scroll sleep.
The default setting of .75 seconds is a minumum with fast internet connection.
Function presuposses an active remote Selenium driver.
Function only accepts dates in format '%Y-%m-%d' (Year-month-day: 'YYYY-mm-dd')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.