Travis-CI Build Status codecov Appveyor Status

wayback

Tools to Work with Internet Archive Wayback Machine APIs

Description

The 'Internet Archive' provides access to millions of cached sites. Methods are provided to access these cached resources through the 'APIs' provided by the 'Internet Archive' and also content from 'MementoWeb'.

What's Inside the Tin?

The following functions are implemented:

Memento-ish API:

Scrape API

Installation

devtools::install_github("hrbrmstr/wayback")
options(width=120)

Usage

library(wayback)
library(tidyverse)

# current verison
packageVersion("wayback")

Memento-ish things

archive_available("https://www.r-project.org/news.html")
get_mementos("https://www.r-project.org/news.html")
get_timemap("https://www.r-project.org/news.html")
cdx_basic_query("https://www.r-project.org/news.html", limit = 10) %>% 
  glimpse()
mem <- read_memento("https://www.r-project.org/news.html")
res <- stringi::stri_split_lines(mem)[[1]]
cat(paste0(res[187:200], collaspe="\n"))

Scrape API

glimpse(
  ia_scrape("lemon curry")
)
(nasa <- ia_scrape("collection:nasa", count=100L))

(item <- ia_retrieve(nasa$identifier[1]))

download.file(item$link[1], file.path("man/figures", item$file[1]))



hrbrmstr/wayback documentation built on May 17, 2019, 5:53 p.m.