Home

/

CRAN

/

archiveRetriever

/

retrieve_links: retrieve_links: Retrieving Links of Lower-level web pages of...

retrieve_links: retrieve_links: Retrieving Links of Lower-level web pages of...
In archiveRetriever: Retrieve Archived Web Pages from the 'Internet Archive'

View source: R/retrieve_links.R

retrieve_links

R Documentation

retrieve_links: Retrieving Links of Lower-level web pages of mementos from the Internet Archive

Description

retrieve_links retrieves the Urls of mementos stored in the Internet Archive

Usage

retrieve_links(
  ArchiveUrls,
  encoding = "UTF-8",
  ignoreErrors = FALSE,
  filter = TRUE,
  pattern = NULL,
  nonArchive = FALSE
)

Arguments

`ArchiveUrls`	A string of the memento of the Internet Archive
`encoding`	Specify a encoding for the homepage. Default is 'UTF-8'
`ignoreErrors`	Ignore errors for some Urls and proceed scraping
`filter`	Filter links by top-level domain. Only sub-domains of top-level domain will be returned. Default is TRUE.
`pattern`	Filter links by custom pattern instead of top-level domains. Default is NULL.
`nonArchive`	Logical input. Can be set to TRUE if you want to use the archiveRetriever to scrape web pages outside the Internet Archive.

Value

This function retrieves the links of all lower-level web pages of mementos of a homepage available from the Internet Archive. It returns a tibble including the baseUrl and all links of lower-level web pages. However, a memento being stored in the Internet Archive does not guarantee that the information from the homepage can be actually scraped. As the Internet Archive is an internet resource, it is always possible that a request fails due to connectivity problems. One easy and obvious solution is to re-try the function.

Examples

## Not run: 
retrieve_links("http://web.archive.org/web/20190801001228/https://www.spiegel.de/")

## End(Not run)

archiveRetriever documentation built on Nov. 5, 2025, 7:25 p.m.

archiveRetriever index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

archiveRetriever
Retrieve Archived Web Pages from the 'Internet Archive'

retrieve_links: retrieve_links: Retrieving Links of Lower-level web pages of...
In archiveRetriever: Retrieve Archived Web Pages from the 'Internet Archive'

retrieve_links: Retrieving Links of Lower-level web pages of mementos from the Internet Archive

Description

Usage

Arguments

Value

Examples

Related to retrieve_links in archiveRetriever...

R Package Documentation

Browse R Packages

We want your feedback!

archiveRetriever Retrieve Archived Web Pages from the 'Internet Archive'

retrieve_links: retrieve_links: Retrieving Links of Lower-level web pages of... In archiveRetriever: Retrieve Archived Web Pages from the 'Internet Archive'

retrieve_links: Retrieving Links of Lower-level web pages of mementos from the Internet Archive

Description

Usage

Arguments

Value

Examples

Related to retrieve_links in archiveRetriever...

R Package Documentation

Browse R Packages

We want your feedback!

archiveRetriever
Retrieve Archived Web Pages from the 'Internet Archive'

retrieve_links: retrieve_links: Retrieving Links of Lower-level web pages of...
In archiveRetriever: Retrieve Archived Web Pages from the 'Internet Archive'