fetch_all_detailpage_html: Fetch html of detailpage urls

Description Usage Arguments Value Examples

Description

Retrieve the html code for a vector of detailpage urls, returning the urls as well as the html code.

Usage

1
2
fetch_all_detailpage_html(detailpageurls, max_items = Inf,
  delay_detailpage = NA)

Arguments

detailpageurls

A character vector containing urls to sub-pages with detailed product descriptions (as found when following a link in the listing page).

max_items

A numeric (integer) vector of length one, specifying the maximum number of items to scrape. (Default: Inf). If max_items is smaller than the length of the passed urls in detailpageurls, only the first max_items entries are fetched.

delay_detailpage

Number of seconds to wait after fetching html of each detailpage (defaults to NA).

Value

A list of length two. The first element, url, contains the vector of urls that was passed to the function. The second list element, html, contains another list with one entry per url, containing the html.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
## first, get data from all listing pages:
url_geizhals <- "https://geizhals.at/?cat=acam35"
listpagehtml_list <- fetch_all_listpages(url_geizhals, max_pages = 2)
dat_listpages <- parse_all_listpages(listpagehtml_list)

## now, get (first three) detailpages:
urls <- dat_listpages$detailpage_url
detailpagehtml_list <- fetch_all_detailpage_html(urls, max_items = 3,
  delay_detailpage = 1)
detailpagehtml_list

## End(Not run)

ingonader/rgeizhals documentation built on May 29, 2019, 3:05 a.m.