get_geizhals_data: Get data from geizhals list and detail pages

Description Usage Arguments Value Examples

Description

Starting from an url, get the information on all items in this list (and the following pages), as well as the information in the detail pages that correspond to these items.

Usage

1
2
get_geizhals_data(firstlistpageurl, max_pages = 10, max_items = Inf,
  delay_listpage = NA, delay_detailpage = NA, domain = NA)

Arguments

firstlistpageurl

The url of a single geizhals page listing items in a selected category.

max_pages

Maximal number of pages to be scraped. Default is 10.

max_items

A numeric (integer) vector of length one, specifying the maximum number of items to scrape. (Default: Inf). If max_items is smaller than the length of the passed urls in detailpageurls, only the first max_items entries are fetched.

delay_listpage

Number of seconds to wait between fetching subsequent list pages.

delay_detailpage

Number of seconds to wait after fetching html of each detailpage (defaults to NA).

domain

Character vector of length one specifying the domain. If omitted, domain is extracted from firstlistpageurl.

Value

A tibble (data.frame) with all the information in the list page and the corresponding detail pages. Each row corresponds to one product.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
url_geizhals <- "https://geizhals.at/?cat=acam35"
dat_gh <- get_geizhals_data(url_geizhals, max_pages = 1)
head(dat_gh)

dat_gh <- get_geizhals_data(url_geizhals, max_items = 3,
  delay_listpage = 1, delay_detailpage = 1)
head(dat_gh)

## End(Not run)

ingonader/rgeizhals documentation built on May 29, 2019, 3:05 a.m.