View source: R/oc_collect_funcs.R
oc_collect_categories | R Documentation |
The 5 oc_collect_*
functions scrape the ocado website
and return the data indicated by each function name.
oc_collect_categories(remDr = remDr, url = "https://www.ocado.com") oc_collect_product_general( remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1, url = "https://www.ocado.com" ) oc_collect_product_extra( remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1 ) oc_collect_product_reviews( remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1 ) oc_collect_nutrition_table( remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1 )
remDr |
Remote client driver |
url |
ocado url |
links_to_use |
Product Links |
sleep_min |
Minimum time to suspend executing R expressions |
sleep_max |
Maximum time to suspend executing R expressions |
*_categories
: Tibble with category links
*_product_general
: Tibble with general product data
*_product_extra
: Tibble with extra product data
*_product_reviews
: Tibble with product reviews
*_nutrition_table
: List with products' nutrition tables
In order to play nice with the website, the scraper functions have a built in 'sleep functionality'. This means that the functions will suspend execution (i.e., go to sleep) for a random time interval, usually less than 11 seconds whenever the sleep function, nytnyt, is called. See the vignette for more information.
These functions are verbose, allowing the user to get a sense of the progress being made.
eg_collect_location_links
for ocado data collection.
nytnyt
for sleep functionality.
## Not run: # Initiate server remDr <- RSelenium::rsDriver(port = netstat::free_port(), browser = "firefox", verbose = FALSE)$client # (A) Collect category links oc_category <- oc_collect_categories(remDr = remDr) # (B) Collect product data from 1 category chosen_category_links <- 7 oc_product_general <- oc_collect_product_general( oc_category$link[chosen_category_links]) # (C) Collect extra product data for 3 products set.seed(132) random_product_links <- sample(1:length(oc_product_general$product_link), 3, replace = FALSE) oc_product_extra <- oc_collect_product_extra( oc_product_general$product_link[random_product_links[1:3]]) # (D) Collect product reviews, if available, for the same 3 products oc_product_review <- oc_collect_product_reviews( oc_product_general$product_link[random_product_links[1:3]]) # (E) Collect product nutrition table, if available, for the same 3 products oc_nutrition_table <- oc_collect_nutrition_table( oc_product_general$product_link[random_product_links[1:3]]) # Close the server remDr$close() gc(remDr) rm(remDr) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.