Description Usage Arguments Details Examples
You may not want to change your existing workflows to use the httr
GET
and POST
helpers. It it not uncommon to lapply
or purrr::map
a series
of httr
verb cals into a list of response
objects. Those that have been
bitten by the intermittent HTTP errors that cause scraping loops to fail
will also likely be using purrr::safely
to wrap httr
verb calls to ensure
the loop succeeds in capturing some information.
1 2 3 4 | response_list_to_warc_file(httr_response_list, path, gzip = TRUE,
warc_date = Sys.time(), warc_record_id = NULL, warc_info = list(software
= sprintf("jwatr %s", packageVersion("jwatr")), format =
"WARC File Format 1.0"))
|
httr_response_list |
a list of |
path |
path (dir + base file name) to the created WARC file |
gzip |
should the WARC file be gzip'd? |
warc_date |
A supplied |
warc_record_id |
A unique identifier for the WARC record. If not provided one
will be generated with |
warc_info |
a named |
This function makes it easy to turn a list of these response
objects (wrapped
or plain) into a WARC file. Sure, you can save an R list
to an R data file,
but that won't be usable by folks outside the R ecosystem. Plus, there are scads of
tools that can work with WARC files, including those in large-scale data
processing environments.
List elements that are not plain or "safe" response
objects will be gracefully
skipped over.
1 2 3 4 5 6 7 8 9 10 | ## Not run:
urls <- c("https://rud.is/", "https://rud.is/b/")
res_list <- lapply(urls, httr::GET)
tf <- tempfile()
response_list_to_warc_file(res_list, tf)
ulink(tf)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.