oc_collect_categories: Collect ocado data

View source: R/oc_collect_funcs.R

oc_collect_categoriesR Documentation

Collect ocado data

Description

The 5 oc_collect_* functions scrape the ocado website and return the data indicated by each function name.

Usage

oc_collect_categories(remDr = remDr, url = "https://www.ocado.com")

oc_collect_product_general(
  remDr = remDr,
  links_to_use,
  sleep_min = 0,
  sleep_max = 1,
  url = "https://www.ocado.com"
)

oc_collect_product_extra(
  remDr = remDr,
  links_to_use,
  sleep_min = 0,
  sleep_max = 1
)

oc_collect_product_reviews(
  remDr = remDr,
  links_to_use,
  sleep_min = 0,
  sleep_max = 1
)

oc_collect_nutrition_table(
  remDr = remDr,
  links_to_use,
  sleep_min = 0,
  sleep_max = 1
)

Arguments

remDr

Remote client driver

url

ocado url

links_to_use

Product Links

sleep_min

Minimum time to suspend executing R expressions

sleep_max

Maximum time to suspend executing R expressions

Value

*_categories: Tibble with category links

*_product_general: Tibble with general product data

*_product_extra: Tibble with extra product data

*_product_reviews: Tibble with product reviews

*_nutrition_table: List with products' nutrition tables

Note

In order to play nice with the website, the scraper functions have a built in 'sleep functionality'. This means that the functions will suspend execution (i.e., go to sleep) for a random time interval, usually less than 11 seconds whenever the sleep function, nytnyt, is called. See the vignette for more information.

These functions are verbose, allowing the user to get a sense of the progress being made.

See Also

eg_collect_location_links for ocado data collection. nytnyt for sleep functionality.

Examples

## Not run: 
# Initiate server
remDr <- RSelenium::rsDriver(port = netstat::free_port(),
browser = "firefox", verbose = FALSE)$client

# (A) Collect category links
oc_category <- oc_collect_categories(remDr = remDr)

# (B) Collect product data from 1 category
chosen_category_links <- 7
oc_product_general <- oc_collect_product_general(
oc_category$link[chosen_category_links])

# (C) Collect extra product data for 3 products
set.seed(132)
random_product_links <- sample(1:length(oc_product_general$product_link),
3, replace = FALSE)
oc_product_extra <- oc_collect_product_extra(
oc_product_general$product_link[random_product_links[1:3]])

# (D) Collect product reviews, if available, for the same 3 products
oc_product_review <- oc_collect_product_reviews(
oc_product_general$product_link[random_product_links[1:3]])

# (E) Collect product nutrition table, if available, for the same 3 products
oc_nutrition_table <- oc_collect_nutrition_table(
oc_product_general$product_link[random_product_links[1:3]])

# Close the server
remDr$close()
gc(remDr)
rm(remDr)

## End(Not run)

moamiristat/grocerycart documentation built on June 15, 2022, 10 a.m.