dynCurlReader: Dynamically determine content-type of body from HTTP header...

dynCurlReaderR Documentation

Dynamically determine content-type of body from HTTP header and set body reader

Description

This function is used for the writefunction option in a curl HTTP request. The idea is that we read the header of the HTTP response and when our code determines that the header is complete (the presence of a blank line), it examines the contents of the header and finds a Content-Type field. It uses the value of this to determine the nature of the body of the HTTP response and dynamically (re)sets the reader for the curl handle appropriately. If the content is binary, it collects the content into a raw vector; if it is text, it sets the appropriate character encoding and collects the content into a character vector.

This function is like basicTextGatherer but behaves dynamically by determining how to read the content based on the header of the HTTP response. This function returns a list of functions that are used to update and query a shared state across calls.

Usage

dynCurlReader(curl = getCurlHandle(), txt = character(), max = NA,
              value = NULL, verbose = FALSE, binary = NA, baseURL = NA,
              isHTTP = NA, encoding = NA)

Arguments

curl

the curl handle to be used for the request. It is essential that this handle be used in the low-level call to curlPerform so that the update element sets the reader for the body on the appropriate curl handle that is used in the request.

txt

initial value of the text. This is almost always an empty character vector.

max

the maximum number of characters to read. This is almost always NA.

value

a function that can be specified which will be used to convert the body of the response from text or raw in a customized manner, e.g. uncompress a gzip body. This can als be done explicitly with a call fun(reader$value()) after the body has been read. The advantage of specifying the function in the constructor of the reader is that the end-user doesn't have to know which function to use to do the conversion.

verbose

a logical value indicating whether messages about progress and operations are written on the console as the header and body are processed.

binary

a logical value indicating whether the caller knows whether the resulting content is binary (TRUE) or not (FALSE) or unknown (NA).

baseURL

the URL of the request which can be used to follow links to other URLs that are described relative to this.

isHTTP

a logical value indicating whether the request/download uses HTTP or not. If this is NA, we determine this when the header is received. If the caller knows this is an FTP or other request, they can specify this when creating the reader.

encoding

a string that allows the caller to specify and override the encoding of the result. This is used to convert text returned from the server.

Value

A list with 5 elements all of which are functions. These are

update

the function that does the actual reading/processing of the content that libcurl passes to it from the header and the body. This is the work-horse of the reader.

value

a function to get the body of the response

header

a function to get the content of the HTPP header

reset

a function to reset the internal contents which allows the same reader to be re-used in subsequent HTTP requests

curl

accessor function for the curl handle specified in the call to create this dynamic reader object.

This list has the S3 class vector c("DynamicRCurlTextHandler", "RCurlTextHandler", "RCurlCallbackFunction")

Author(s)

Duncan Temple Lang

References

libcurl https://curl.se/

See Also

basicTextGatherer curlPerform getURLContent

Examples


   # Each of these examples can be done with getURLContent().
   # These are here just to illustrate the dynamic reader.
if(url.exists("https://www.omegahat.net/Rcartogram/demo.jpg")) withAutoprint({
  header = dynCurlReader()
  curlPerform(url = "https://www.omegahat.net/Rcartogram/demo.jpg",
              headerfunction = header$update, curl = header$curl())
  class( header$value() )
  length( header$value() )
})

if(url.exists("https://www.omegahat.net/dd.gz")) withAutoprint({
     # gzip example.
  header = dynCurlReader()
  curlPerform(url = "https://www.omegahat.net/dd.gz",
              headerfunction = header$update, curl = header$curl())
  class( header$value() )
  length( header$value() )

  if (getRversion() >= "4")
     cat(memDecompress(header$value(), asChar = TRUE))
   ## or   cat(Rcompression::gunzip(header$value()))
})


   # Character encoding example
## Not run: 
  header = dynCurlReader()
  curlPerform(url = "http://www.razorvine.net/test/utf8form/formaccepter.sn",
               postfields = c(text = "ABC", outputencoding =  "UTF-8"),
               verbose = TRUE,
               writefunction = header$update, curl = header$curl())
  class( header$value() )
  Encoding( header$value() )

## End(Not run)

RCurl documentation built on Sept. 11, 2024, 8:36 p.m.

Related to dynCurlReader in RCurl...