req_perform_parallel: Perform a list of requests in parallel

View source: R/multi-req.R

req_perform_parallelR Documentation

Perform a list of requests in parallel

Description

This variation on req_perform_sequential() performs multiple requests in parallel. Exercise caution when using this function; it's easy to pummel a server with many simultaneous requests. Only use it with hosts designed to serve many files at once, which are typically web servers, not API servers.

req_perform_parallel() has a few limitations:

  • Will not retrieve a new OAuth token if it expires part way through the requests.

  • Does not perform throttling with req_throttle().

  • Does not attempt retries as described by req_retry().

  • Only consults the cache set by req_cache() before/after all requests.

If any of these limitations are problematic for your use case, we recommend req_perform_sequential() instead.

Usage

req_perform_parallel(
  reqs,
  paths = NULL,
  pool = NULL,
  on_error = c("stop", "return", "continue"),
  progress = TRUE
)

Arguments

reqs

A list of requests.

paths

An optional list of paths, if you want to download the request bodies to disks. If supplied, must be the same length as reqs.

pool

Optionally, a curl pool made by curl::new_pool(). Supply this if you want to override the defaults for total concurrent connections (100) or concurrent connections per host (6).

on_error

What should happen if one of the requests fails?

  • stop, the default: stop iterating with an error.

  • return: stop iterating, returning all the successful responses received so far, as well as an error object for the failed request.

  • continue: continue iterating, recording errors in the result.

progress

Display a progress bar? Use TRUE to turn on a basic progress bar, use a string to give it a name, or see progress_bars to customise it in other ways.

Value

A list, the same length as reqs, containing responses and possibly error objects, if on_error is "return" or "continue" and one of the responses errors. If on_error is "return" and it errors on the ith request, the ith element of the result will be an error object, and the remaining elements will be NULL. If on_error is "continue", it will be a mix of requests and error objects.

Only httr2 errors are captured; see req_error() for more details.

Examples

# Requesting these 4 pages one at a time would take 2 seconds:
request_base <- request(example_url())
reqs <- list(
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5")
)
# But it's much faster if you request in parallel
system.time(resps <- req_perform_parallel(reqs))

# req_perform_parallel() will fail on error
reqs <- list(
  request_base |> req_url_path("/status/200"),
  request_base |> req_url_path("/status/400"),
  request("FAILURE")
)
try(resps <- req_perform_parallel(reqs))

# but can use on_error to capture all successful results
resps <- req_perform_parallel(reqs, on_error = "continue")

# Inspect the successful responses
resps |> resps_successes()

# And the failed responses
resps |> resps_failures() |> resps_requests()

httr2 documentation built on Nov. 14, 2023, 5:08 p.m.