parse_all_detailpages: Parse data from multiple product detail pages

Description Usage Arguments Value Examples

Description

Returns all categories and their values in a list of detailed product description pages, as well as a summary of all price values from the price list in each of the detailed product description pages. In contrast to the parse_single_detailpage function, the categories describing a product are the columns, and each product is represented as a row in the resulting tibble (data.frame). The tibble has as many columns as there are categories, if a product doesn't feature all categories in its description, this column will be NA. Column types are inferred from the data automatically. If returntype is specified to be "list", the data is returned as a list, without combining the data into a data frame.

Usage

1
parse_all_detailpages(detailpagehtml_list, returntype = "data.frame")

Arguments

detailpagehtml_list

A list of html structure from multiple geizhals page listing details of a specific item.

returntype

Either "list" or "data.frame" (default).

Value

A tibble (data.frame) with as many columns as there are distinct categories in all feature pages, and as many rows as there are products for returntype = "data.frame". Otherwise, a list where each list entry containes the parsed data from a single detailpage (not necessarily with each list entry having the same categories).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
## get data from multiple geizhals category pages:
url_geizhals <- "https://geizhals.at/?cat=acam35"
listpagehtml_list <- fetch_all_listpages(url_geizhals, max_pages = 2)
dat_listpage <- parse_all_listpages(listpagehtml_list)
## pick only the three first detailpage urls:
wch_detailpage_urls <- dat_listpage[["detailpage_url"]][1:3]
detailpagehtml_list <- fetch_all_detailpage_html(wch_detailpage_urls)
## get data from all detailpages:
dat_detailpages <- parse_all_detailpages(detailpagehtml_list)
head(dat_detailpages)
## get the same data as a list:
dat_detailpages_list <- parse_all_detailpages(detailpagehtml_list,
                                              returntype = "list")
head(dat_detailpages_list)

## End(Not run)

ingonader/rgeizhals documentation built on May 29, 2019, 3:05 a.m.