View source: R/fetchR_parseR_edit.R
fetchR_parseR_edit | R Documentation |
Fetches list of URL's created by the generateR() function.
fetchR_parseR_edit(
out_dir = NULL,
work_dir = NULL,
fetch_list = NULL,
crawl_delay = NULL,
max_concurr = NULL,
max_concurr_host = NULL,
timeout = Inf,
timeout_request = NULL,
queue_scl = 1,
comments = "",
log_file = NULL,
readability_content = F,
parser = crawlR::parse_content,
writer = NULL,
status_print_interval = 500,
curl_opts = list(`User-Agent` =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36",
`Accept-Language` = "en;q=0.7", Connection = "close", CURLOPT_DNS_CACHE_TIMEOUT =
"3600")
)
out_dir |
(Required) Current output directory. |
work_dir |
(Required) Current working directory. |
fetch_list |
(Required) Created by generateR.R. |
crawl_delay |
time (in seconds) for calls to the same host. |
max_concurr |
Max. total concurrent connections open at any given time. |
max_concurr_host |
Max. total concurrent connections per host at any given time. |
timeout |
Total (all requests) timeout |
timeout_request |
per request timeout |
queue_scl |
Scaler |
comments |
Some comments to print while running. |
log_file |
Name of log file. If null, writes to stdout(). |
readability_content |
T |
parser |
parse func |
writer |
placeholder to allow custom output functions |
status_print_interval |
num urls fetched between crawler status outputs |
curl_opts |
list of curl options |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.