read_cdx: Read a WARC CDX index file

Description Usage Arguments Details References Examples

View source: R/cdx.r

Description

CDX files are used to index the content of WARC files.

Usage

1
read_cdx(path, warc_path = NULL)

Arguments

path

path to CDX file

warc_path

path to the WARC files referenced in path. Defaults to to the location of the CDX file

Details

The returned object is a tbl_df but is also classed cdx.

References

https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/

Examples

1
2
3
4
## Not run: 
cdx <- read_cdx(system.file("extdata", "20160901.cdx", package="warc"))

## End(Not run)

hrbrmstr/warc documentation built on May 17, 2019, 5:53 p.m.