retrieve_urls: retrieve_urls: Retrieving Urls from the Internet Archive

View source: R/retrieve_urls.R

retrieve_urlsR Documentation

retrieve_urls: Retrieving Urls from the Internet Archive

Description

retrieve_urls retrieves the Urls of mementos stored in the Internet Archive

Usage

retrieve_urls(homepage, startDate, endDate, collapseDate = TRUE)

Arguments

homepage

A character vector of the homepage, including the top-level-domain

startDate

A character vector of the starting date of the overview. Accepts a large variety of date formats (see anytime)

endDate

A character vector of the ending date of the overview. Accepts a large variety of date formats (see anytime)

collapseDate

A logical value indicating whether the output should be limited to one memento per day

Value

This function retrieves the mementos of a homepage available from the Internet Archive. It returns a vector of strings of all mementos stored in the Internet Archive in the respective time frame. The mementos only refer to the homepage being retrieved and not its lower level web pages. However, a memento being stored in the Internet Archive does not guarantee that the information from the homepage can be actually scraped. As the Internet Archive is an internet resource, it is always possible that a request fails due to connectivity problems. One easy and obvious solution is to re-try the function.

Examples

## Not run: 
retrieve_urls("www.spiegel.de", "20190801", "20190901")
retrieve_urls("nytimes.com", startDate = "2018-01-01", endDate = "01/02/2018")
retrieve_urls("nytimes.com", startDate = "2018-01-01", endDate = "2018-01-02", collapseDate = FALSE)

## End(Not run)

archiveRetriever documentation built on June 22, 2024, 10:54 a.m.