fetchR: Fetch a List of Url's.

View source: R/fetchR.R

fetchRR Documentation

Fetch a List of Url's.

Description

Based on the curl package (a wrapper for libcurl). The fetch list of urls is organized into batches, with each batch containing one url from one host. Provides a convienent way to avoid hitting a server too often. A delay also kicks in if a host is being queried too quickly.

Usage

fetchR(
  out_dir = NULL,
  work_dir = NULL,
  fetch_list = NULL,
  crawl_delay = NULL,
  max_concurr = NULL,
  max_host = NULL,
  timeout = Inf,
  queue_scl = 1,
  comments = "",
  save_to_disk = T,
  return = F,
  log_file = NULL
)

Arguments

out_dir

(Required) Current output directory.

work_dir

(Required) Current working directory.

fetch_list

(Required) Created by generateR.R.

crawl_delay

time (in seconds) for calls to the same host.

max_concurr

Max. total concurrent connections open at any given time.

max_host

Max. total concurrent connections per host at any given time.

timeout

Timeout time

queue_scl

Scaler

comments

Some comments to print while running.

save_to_disk

Save output to disk or not.

return

Return output or not.

log_file

Name of log file. If null, writes to stdout().

Value

None.


barob1n/crawlR documentation built on May 23, 2023, 10:53 a.m.