Tools to Work with Internet Archive Wayback Machine APIs
The 'Internet Archive' provides access to millions of cached sites. Methods are provided to access these cached resources through the 'APIs' provided by the 'Internet Archive' and also content from 'MementoWeb'.
The following functions are implemented:
Memento-ish API:
archive_available
: Does the Internet Archive have a URL cached?cdx_basic_query
: Perform a basic/limited Internet Archive CDX resource query for a URLget_mementos
: Retrieve site mementos from the Internet Archiveget_timemap
: Retrieve a timemap for a URLread_memento
: Read a resource directly from the Time Travel MementoWebis_memento
: Various memento-type testers (useful in purrr
or dplyr
contexts)is_first_memento
: Various memento-type testers (useful in purrr
or dplyr
contexts)is_next_memento
: Various memento-type testers (useful in purrr
or dplyr
contexts)is_prev_memento
: Various memento-type testers (useful in purrr
or dplyr
contexts)is_last_memento
: Various memento-type testers (useful in purrr
or dplyr
contexts)is_original
: Various memento-type testers (useful in purrr
or dplyr
contexts)is_timemap
: Various memento-type testers (useful in purrr
or dplyr
contexts)is_timegate
: Various memento-type testers (useful in purrr
or dplyr
contexts)Scrape API
ia_retrieve:
Retrieve directory listings for Internet Archive objects by identifieria_scrape
: Internet Archive Scraping API Accessia_scrape_has_more
: 'ia_scrape()' Pagination Helpersia_scrape_next_page
: Internet Archive Scraping API Accessdevtools::install_github("hrbrmstr/wayback")
options(width=120)
library(wayback) library(tidyverse) # current verison packageVersion("wayback")
archive_available("https://www.r-project.org/news.html")
get_mementos("https://www.r-project.org/news.html")
get_timemap("https://www.r-project.org/news.html")
cdx_basic_query("https://www.r-project.org/news.html", limit = 10) %>% glimpse()
mem <- read_memento("https://www.r-project.org/news.html") res <- stringi::stri_split_lines(mem)[[1]] cat(paste0(res[187:200], collaspe="\n"))
glimpse( ia_scrape("lemon curry") )
(nasa <- ia_scrape("collection:nasa", count=100L)) (item <- ia_retrieve(nasa$identifier[1])) download.file(item$link[1], file.path("man/figures", item$file[1]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.