View source: R/eg_collect_funcs.R
eg_collect_location_links | R Documentation |
The 5 eg_collect_*
functions chronologically scrape the
elgrocer website and return the data indicated by each function name.
eg_collect_location_links(remDr = remDr, url = "https://www.elgrocer.com") eg_collect_stores_details( remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1, url = "https://www.elgrocer.com" ) eg_collect_categories( remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1, url = "https://www.elgrocer.com" ) eg_collect_subcategories( remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1, url = "https://www.elgrocer.com" ) eg_collect_items(remDr = remDr, links_to_use, sleep_min = 0, sleep_max = 1)
remDr |
Remote client driver |
url |
elgrocer url |
links_to_use |
Subcategory links |
sleep_min |
Minimum time to suspend executing R expressions |
sleep_max |
Maximum time to suspend executing R expressions |
*_location_links
: Tibble with the URL for each location
*_store_details
: Tibble with store links
*_categories
: Tibble with category links
*_subcategories
: Tibble with subcategory links
*_items
: Tibble with product details
In order to play nice with the website, the scraper functions have a built in 'sleep functionality'. This means that the functions will suspend execution (i.e., go to sleep) for a random time interval, usually less than 11 seconds whenever the sleep function, nytnyt, is called. See the vignette for more information.
These functions are verbose, allowing the user to get a sense of the progress being made.
oc_collect_categories
for data collection from Ocado.
nytnyt
for sleep functionality.
## Not run: # Initiate server remDr <- RSelenium::rsDriver(port = netstat::free_port(), browser = "firefox", verbose = FALSE)$client # (A) Collect all location links eg_location <- eg_collect_location_links(remDr = remDr, url = "https://www.elgrocer.com") # (B) Collect store details from 5 locations eg_store <- eg_collect_stores_details(remDr, eg_location$location_link[1:5]) # (C) Collect categories from 3 stores eg_category <- eg_collect_categories(remDr, eg_store$store_link[1:3]) # (D) Collect subcategories from 3 categories random_category_links <- sample(1:length(eg_category$category_link), 3, replace = FALSE) eg_subcategory <- eg_collect_subcategories(remDr, eg_category$category_link[random_category_links]) # (E) Collect product data from 2 subcategories random_subcategory_links <- sample(1:length(eg_subcategory$subcategory_link), 2, replace = FALSE) eg_item <- eg_collect_items(remDr, eg_subcategory$subcategory_link[random_subcategory_links]) # Close the server remDr$close() gc(remDr) rm(remDr) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.