parse_single_listpage: Parse information from geizhals category page

Description Usage Arguments Value Examples

Description

Returns information (e.g., product names, product ratings, number of ratings, detail page urls) listed in a geizhals page that is listing all products within a specific category (i.e., not the generic page-wide search from the search bar, but the page showing all items within a category. Filters might be applied, only results corresponding to that filter will be shown and scraped.) The order of items returned by the function might not correspond to the order listed on the webpage, but it is the same order in all related functions.

Usage

1
parse_single_listpage(listpagehtml, domain = "https://geizhals.at")

Arguments

listpagehtml

html structure from a single geizhals page listing items in a selected category, as gathered via xml2::read_html() or via a single entry of the list of listing pages resulting from fetch_all_listpages.

domain

Character vector of length one specifying the domain. Defaults to "https://geizhals.at".

Value

A tibble (data.frame) containing all information scraped from the geizhals page.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
## get html of a geizhals category listing page via read_html:
url_geizhals <- "https://geizhals.at/?cat=acam35"
listpagehtml <- xml2::read_html(url_geizhals)
parse_single_listpage(listpagehtml)

## get html of multiple geizhals category listing pages:
listpagehtml_list <- fetch_all_listpages(url_geizhals)
parse_single_listpage(listpagehtml_list[[1]])

## get html from a geizhals.eu page and parse:
url_geizhals <- "https://geizhals.eu/?cat=acam35"
listpagehtml <- xml2::read_html(url_geizhals)
parse_single_listpage(listpagehtml, domain = "https://geizhals.eu")

## End(Not run)

ingonader/rgeizhals documentation built on May 29, 2019, 3:05 a.m.