read_warc_entry: Read a WARC entry from a WARC file

Description Usage Arguments Details Note Examples

View source: R/read_warc_entry.r

Description

Given the path to a WARC file (compressed or uncompressed) and the start position of the WARC record, this function will produce an R object from the WARC record.

Usage

1
read_warc_entry(path, start, compressed = grepl(".gz$", path))

Arguments

path

path to WARC file

start

starting offset of WARC record

Details

WARC warinfo objects are returned classed both warc and info.

WARC response objects are returned classed both warc and httr::response and the standard httr content functions will work with the object.

WARC request objects are returned classed both warc and httr::request.

Note

warcinfo, request and response objects are currently supported.

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
cdx <- read_cdx(system.file("extdata", "20160901.cdx", package="warc"))
i <- 1
path <- file.path(cdx$warc_path[i], cdx$file_name[i])
start <- cdx$compressed_arc_file_offset[i]

(read_warc_entry(path, start))

## End(Not run)

hrbrmstr/warc documentation built on May 17, 2019, 5:53 p.m.