Description Usage Arguments Value See Also Examples
scrapePage
is a general (i.e. not sport specific) function that
scrapes the provided url and includes basic error handling in the event of
repeated failures. It creates a local variable called page, which is
initialised to NULL. If the scraping attempt is successful, page is converted
to an XML document. THe function will retry until success or the specified
number of attempts has been exceeded. An additional parameter specifies the
time to wait between attempts.
1 | scrapePage(url, numAttempts, sleepTime = 0)
|
url |
string. The url of the page to be scraped (Note: http://www.betfair.com is valid, while www.betfair.com is not valid). Required. No default. |
numAttempts |
integer. Specifies the number of attempts before aborting this particular scraping attempt. Required. No default. |
sleepTime |
integer. THis parameter specifies the amount of time (in seconds) the function waits following a failed scraping attempt. Required. Default is 0. |
If successful, this function will return an xml_document, which contains various web page data (e.g. nodes and links). If all scraping attempts failed, a dataframe outlining the source of the last error is returned.
https://cran.r-project.org/web/packages/rvest/rvest.pdf for general information on easily harvesting (scraping) web pages.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run:
# Simple example of a valid function construction (with no wait betwen successive attempts):
scrapePage("http://www.betfair.com",2)
# Simple example of an invalid function construction (with a 5 s wait betwen successive attempts):
# Just note the difference in the output structure in both cases
scrapePage("www.betfair.com",2,5)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.