View source: R/retrieve_links.R
retrieve_links | R Documentation |
retrieve_links
retrieves the Urls of mementos stored in the Internet Archive
retrieve_links(
ArchiveUrls,
encoding = "UTF-8",
ignoreErrors = FALSE,
filter = TRUE,
pattern = NULL,
nonArchive = FALSE
)
ArchiveUrls |
A string of the memento of the Internet Archive |
encoding |
Specify a encoding for the homepage. Default is 'UTF-8' |
ignoreErrors |
Ignore errors for some Urls and proceed scraping |
filter |
Filter links by top-level domain. Only sub-domains of top-level domain will be returned. Default is TRUE. |
pattern |
Filter links by custom pattern instead of top-level domains. Default is NULL. |
nonArchive |
Logical input. Can be set to TRUE if you want to use the archiveRetriever to scrape web pages outside the Internet Archive. |
This function retrieves the links of all lower-level web pages of mementos of a homepage available from the Internet Archive. It returns a tibble including the baseUrl and all links of lower-level web pages. However, a memento being stored in the Internet Archive does not guarantee that the information from the homepage can be actually scraped. As the Internet Archive is an internet resource, it is always possible that a request fails due to connectivity problems. One easy and obvious solution is to re-try the function.
## Not run:
retrieve_links("http://web.archive.org/web/20190801001228/https://www.spiegel.de/")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.