parse_detailpage_urls: Parse urls of detail pages for items in geizhals category...

Description Usage Arguments Value Examples

Description

Returns all urls of links to details pages (product details) in a geizhals page listing all products within a specific category (i.e., not the generic page-wide search from the search bar, but the page showing all items within a category. Filters might be applied, only results corresponding to that filter will be shown and scraped.) The order of items returned by the function might not correspond to the order listed on the webpage, but it is the same order in all related functions.

Usage

1
parse_detailpage_urls(listpagehtml, domain = "https://geizhals.at")

Arguments

listpagehtml

html structure from a single geizhals page listing items in a selected category, as gathered via xml2::read_html() or via a single entry of the list of listing pages resulting from fetch_all_listpages.

domain

Character vector of length one specifying the domain. Defaults to "https://geizhals.at".

Value

A character vector containing the urls appearing in the geizhals listing page.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
## get html of a geizhals category listing page via read_html:
url_geizhals <- "https://geizhals.at/?cat=acam35"
listpagehtml <- xml2::read_html(url_geizhals)
parse_detailpage_urls(listpagehtml)

## get html of multiple geizhals category listing pages:
listpagehtml_list <- fetch_all_listpages(url_geizhals)
parse_detailpage_urls(listpagehtml_list[[1]])

## get html from a geizhals.eu page and parse:
url_geizhals <- "https://geizhals.eu/?cat=acam35"
listpagehtml <- xml2::read_html(url_geizhals)
parse_detailpage_urls(listpagehtml, domain = "https://geizhals.eu")

## End(Not run)

ingonader/rgeizhals documentation built on May 29, 2019, 3:05 a.m.