Description Usage Arguments Details Value Scroll sleep WARNING
View source: R/screen_scrape_tweets.R
Given a twitter account screen name or ID, and start and end dates,
function screen-scrapes IDs of historical tweets in time range and returns them
in a data frame.
Optionally, the scraped IDs can additionally be written to disk (if write.out = TRUE).
1 2 3 4 5 6 | scrape_tweet_ids(tw.account, remdr, since.date, until.date,
date.interval = "month", max.tweets.pi = 10000, write.out = TRUE,
write.out.path,
write.out.name = sprintf("tw_user_%s_tweet_ids_%s.json", tw.account,
paste0(since.date, "_to_", until.date)), sleep = 0.5,
.scroll.sleep = 0.75, verbose = TRUE)
|
tw.account |
a scalar character vector, specifying a Twitter screen name or account ID |
remdr |
an active RSelenium |
since.date |
create date of oldest tweets to get Only accepts dates in format '%Y-%m-%d' (Year-month-day: 'YYYY-mm-dd') |
until.date |
create date of most recent (youngest) tweets to get Only accepts dates in format '%Y-%m-%d' (Year-month-day: 'YYYY-mm-dd') |
date.interval |
date interval passed to 'by' argument of |
max.tweets.pi |
maximum nuber of tweets per intevall to load. Defaults to 10'000. (See Dtails section) |
write.out |
logical. write out tweet IDs as JSON to disk?
If |
write.out.path |
Write out path (directory where to write scraped IDs file)
Will be ignored if |
write.out.name |
JSON file name.
Defaults to 'tw_user_< |
sleep |
Seconds to pause between date ranges when iterating over date intervals defined by
|
.scroll.sleep |
Seconds to pause between scrolls when scrolling for more tweets. Defautls to .75 seconds. (See section 'scroll sleep' for details.) |
verbose |
logical. Print out status messages? |
Note that the maximum number of tweets loaded per date interval (max.tweets.pi) needs to be adapted to the date interval.
Per scroll, 20 new tweets are loaded.
By default, there comes a pause of .75 seconds between scrolls.
This means that at maximum, waiting for 10'000 tweets to load takes ((10000/20) * .75)/60 = 6.25 minutes.
A tibble data frame.
The data frame is empty if an error occurs or no tweet IDs were scraped in the given time range.
Otherwise it has columns 'account' (<chr>), 'since' (<date>), 'until' (<date>) and 'tweet_id' (<chr>),
and one row is one tweet.
Argument .scroll.sleep determines how much the Twitter timeline has to fully load.
WARNING: Setting low values (<.75 seconds) endangers not getting all tweet IDs,
as the scraping process can be aborted prematurely due to too little scroll sleep.
The default setting of .75 seconds is a minumum with fast internet connection.
Function presuposses an active remote Selenium driver.
Function only accepts dates in format '%Y-%m-%d' (Year-month-day: 'YYYY-mm-dd')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.