knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

When you make a resource query in the main Wayback web interface you're tapping into the Wayback CDX API. The cdx_basic_query() function in this package is a programmatic interface to that API.

An example use-case was presented in GitHub Issue #3 where the issue originator desired to query for historical CSV files from MaxMind.

The cdx_basic_query() function has the following parameters (you must at least specify the url on your own):

For match_type, if url is "url: archive.org/about/" then:

For collapse the results returned are "collapsed" based on a field, or a substring of a field. Collapsing is done on adjacent cdx lines where all captures after the first one that are duplicate are filtered out. This is useful for filtering out captures that are 'too dense' or when looking for unique captures.

For now, filter is limited to a single expression. This will be enhanced at a later time.

To put the use-case into practice we'll find CSV resources and download one of them:

library(wayback)
library(dplyr)

# query for maxmind prefix
cdx <- cdx_basic_query("http://maxmind.com/", "prefix")

# filter the returned results for CSV files
(csv <- filter(cdx, grepl("\\.csv", original)))

# examine a couple fields
csv$original[9]

csv$timestamp[9]

# read the resource from that point in time using the "raw" 
# interface so as not to mangle the data.
dat <- read_memento(csv$original[9], as.POSIXct(csv$timestamp[9]), "raw")

# read it in
readr::read_csv(dat, col_names = c("iso2c", "regcod", "name"))


hrbrmstr/wayback documentation built on May 17, 2019, 5:53 p.m.