eg_collect_location_links: Collect elgrocer data

View source: R/eg_collect_funcs.R

eg_collect_location_linksR Documentation

Collect elgrocer data

Description

The 5 eg_collect_* functions chronologically scrape the elgrocer website and return the data indicated by each function name.

Usage

eg_collect_location_links(remDr = remDr, url = "https://www.elgrocer.com")

eg_collect_stores_details(
  remDr = remDr,
  links_to_use,
  sleep_min = 0,
  sleep_max = 1,
  url = "https://www.elgrocer.com"
)

eg_collect_categories(
  remDr = remDr,
  links_to_use,
  sleep_min = 0,
  sleep_max = 1,
  url = "https://www.elgrocer.com"
)

eg_collect_subcategories(
  remDr = remDr,
  links_to_use,
  sleep_min = 0,
  sleep_max = 1,
  url = "https://www.elgrocer.com"
)

eg_collect_items(remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1)

Arguments

remDr

Remote client driver

url

elgrocer url

links_to_use

Subcategory links

sleep_min

Minimum time to suspend executing R expressions

sleep_max

Maximum time to suspend executing R expressions

Value

*_location_links: Tibble with the URL for each location

*_store_details: Tibble with store links

*_categories: Tibble with category links

*_subcategories: Tibble with subcategory links

*_items: Tibble with product details

Note

In order to play nice with the website, the scraper functions have a built in 'sleep functionality'. This means that the functions will suspend execution (i.e., go to sleep) for a random time interval, usually less than 11 seconds whenever the sleep function, nytnyt, is called. See the vignette for more information.

These functions are verbose, allowing the user to get a sense of the progress being made.

See Also

oc_collect_categories for data collection from Ocado. nytnyt for sleep functionality.

Examples

## Not run: 
# Initiate server
remDr <- RSelenium::rsDriver(port = netstat::free_port(),
browser = "firefox", verbose = FALSE)$client

# (A) Collect all location links
eg_location <- eg_collect_location_links(remDr = remDr, url = "https://www.elgrocer.com")

# (B) Collect store details from 5 locations
eg_store <- eg_collect_stores_details(remDr, eg_location$location_link[1:5])

# (C) Collect categories from 3 stores
eg_category <- eg_collect_categories(remDr, eg_store$store_link[1:3])

# (D) Collect subcategories from 3 categories
random_category_links <- sample(1:length(eg_category$category_link),
3, replace = FALSE)
eg_subcategory <- eg_collect_subcategories(remDr,
eg_category$category_link[random_category_links])

# (E) Collect product data from 2 subcategories
random_subcategory_links <- sample(1:length(eg_subcategory$subcategory_link),
2, replace = FALSE)
eg_item <- eg_collect_items(remDr,
eg_subcategory$subcategory_link[random_subcategory_links])

# Close the server
remDr$close()
gc(remDr)
rm(remDr)

## End(Not run)

moamiristat/grocerycart documentation built on June 15, 2022, 10 a.m.