payload_content: Helper function to convert WARC raw headers+payload into...

Description Usage Arguments Details

Description

This works much the same way as the content() function in the httr package and conforms to its API for the as, type, encoding and ... fields.

Usage

1
2
payload_content(url, ctype = NULL, headers, payload, as = NULL,
  type = NULL, encoding = NULL, ...)

Arguments

url, ctype, headers, payload

raw content from the target_uri, http_protocol_content_type, http_raw_headers & payload fields of a WARC data frame.

as

desired type of output: raw, text or parsed. content attempts to automatically figure out which one is most appropriate, based on the content-type.

type

MIME type (aka internet media type) used to override the content type returned by the server. See http://en.wikipedia.org/wiki/Internet_media_type for a list of common types.

encoding

For text, overrides the charset or the Latin1 (ISO-8859-1) default, if you know that the server is returning the incorrect encoding as the charset in the content-type. Use for text and parsed outputs.

...

Other parameters parsed on to the parsing functions, if as = "parsed".

Details

Unlike its httr counterpart, payload_content() can handle gzip'd payload contents (httr has it easy since curl decodes the gzip content automagically for it). It does make a best-guess for expanded content size, so is not 100 guaranteed to work for all gzip'd payload content.


hrbrmstr/jwatr documentation built on May 31, 2019, 1:15 p.m.