create_cdx: Create a CDX from a WARC file
In hrbrmstr/warc: Tools to Work with the Web Archive Ecosystem

Description Usage Arguments Details Note References

Takes as input an optionally compressed WARC file and creates a CDX file of warc_record_types with the specified fields (if available) and writes it to cdx_path. If the WARC file is compressed the CDX/WARC specification expects each WARC record to be in it's own "gzstream" (i.e you can't just gzip a plaintext WARC file and expect any CDX indexer to work.)

1 2	create_cdx(warc_path, warc_record_types = "response", field_spec = "abmsrVgu", cdx_path)

`warc_path`	path to the WARC file to index
`warc_record_types`	the WARC record types to index in `cdx_file`. Should be a character vector of field names or "`all`" to index all records. NOTE: Most CDX files index WARC `response` records.
`field_spec`	(See `Description`)
`cdx_path`	where to output the CDX file

Use an atomic character vector of single character CDX field specifications in the order you want them in the CDX file. The default value "abmsrVgu" is taken from the defaults used by wget in "WARC mode" and will output the: