retrieve_urls: retrieve_urls: Retrieving Urls from the Internet Archive
In archiveRetriever: Retrieve Archived Web Pages from the 'Internet Archive'

View source: R/retrieve_urls.R

retrieve_urls

R Documentation

retrieve_urls: Retrieving Urls from the Internet Archive

Description

retrieve_urls retrieves the Urls of mementos stored in the Internet Archive

Usage

retrieve_urls(homepage, startDate, endDate, collapseDate = TRUE)

Arguments

`homepage`	A character vector of the homepage, including the top-level-domain
`startDate`	A character vector of the starting date of the overview. Accepts a large variety of date formats (see anytime)
`endDate`	A character vector of the ending date of the overview. Accepts a large variety of date formats (see anytime)
`collapseDate`	A logical value indicating whether the output should be limited to one memento per day

Value

This function retrieves the mementos of a homepage available from the Internet Archive. It returns a vector of strings of all mementos stored in the Internet Archive in the respective time frame. The mementos only refer to the homepage being retrieved and not its lower level web pages. However, a memento being stored in the Internet Archive does not guarantee that the information from the homepage can be actually scraped. As the Internet Archive is an internet resource, it is always possible that a request fails due to connectivity problems. One easy and obvious solution is to re-try the function.

Examples

## Not run: 
retrieve_urls("www.spiegel.de", "20190801", "20190901")
retrieve_urls("nytimes.com", startDate = "2018-01-01", endDate = "01/02/2018")
retrieve_urls("nytimes.com", startDate = "2018-01-01", endDate = "2018-01-02", collapseDate = FALSE)

## End(Not run)

archiveRetriever documentation built on Nov. 5, 2025, 7:25 p.m.