fetchR: Fetch a List of Url's.
In barob1n/crawlR: Async Web crawler for R.

fetchR

R Documentation

Fetch a List of Url's.

Description

Based on the curl package (a wrapper for libcurl). The fetch list of urls is organized into batches, with each batch containing one url from one host. Provides a convienent way to avoid hitting a server too often. A delay also kicks in if a host is being queried too quickly.

Usage

fetchR(
  out_dir = NULL,
  work_dir = NULL,
  fetch_list = NULL,
  crawl_delay = NULL,
  max_concurr = NULL,
  max_host = NULL,
  timeout = Inf,
  queue_scl = 1,
  comments = "",
  save_to_disk = T,
  return = F,
  log_file = NULL
)

Arguments

`out_dir`	(Required) Current output directory.
`work_dir`	(Required) Current working directory.
`fetch_list`	(Required) Created by generateR.R.
`crawl_delay`	time (in seconds) for calls to the same host.
`max_concurr`	Max. total concurrent connections open at any given time.
`max_host`	Max. total concurrent connections per host at any given time.
`timeout`	Timeout time
`queue_scl`	Scaler
`comments`	Some comments to print while running.
`save_to_disk`	Save output to disk or not.
`return`	Return output or not.
`log_file`	Name of log file. If null, writes to stdout().