getHTMLFormDescription: Construct descriptions of forms in an HTML document

getHTMLFormDescriptionR Documentation

Construct descriptions of forms in an HTML document

Description

This function is used to read and process an HTML document (either from the file system or via HTTP/FTP) and extract descriptions of the <FORM> elements it contains. Each form is collected into an object of class HTMLFormDescription that describes the method for submitting the form (POST or GET), the URL for the submission, and the elements within the form. These descriptions can be used to mimic the form via functions (or GUIs) within R and validate values being submitted to forms.

Usage

getHTMLFormDescription(url, dropButtons = TRUE, ..., baseURL = docName(doc))
getHTMLFormDescriptionViaHandlers(url, ..., multi = TRUE, handlers)

Arguments

url

the URI for the HTML document. If this is a local file, there is no need for

...

arguments that are passed directly to htmlTreeParse

dropButtons

a logical value indicating whether to omit button elements in the form description. These are typically Submit or Reset buttons that are not relevant for submitting the form request from R.

baseURL

the URL of the HTML form. This is used to compute relative URLs for the code associated with the form.

multi

a logical value indicating whether to use the handlers that deal with multiple forms within the document or (FALSE) expect just a single FORM element.

handlers

a collection of functions that are passed to htmlTreeParse as handlers for processing the different HTML elements within the document. The default is to provide an object of class HTMLFormParser which is expected to have a values function that is called after the processing is done for the entire document to retrieve the object describing the HTML form(s). If this is not present, the handlers object is returned and the caller is expected to be able to extract the relevant information. This allows the caller to provide their own handlers that offer different processing facilities. If multi is TRUE (the default) and handlers is not specified, we call multiFormElementHandlers to get the handlers object so as to be able to deal with multiple forms within the URI. If multi is FALSE, we call formElementHandler which expects just a single form and is marginally more efficient as a result. Either of these handler generator functions can be used directly to create the handlers and called with different arguments to control the target URI and/or the check for dynamic forms, or simply to reuse the same instance across multiple form description queries.

Value

If handlers is provided by the caller and is not an object of class HTMLFormParser with a values function element, then the handlers object is returned. The caller is supposed to know how to extract the information.

Otherwise, if handlers is a HTMLMultiFormParser object, a list of HTMLFormDescriptions is returned. The names of the elements in the list are taken from the names of the individual forms, if available. If multi is not specified and there is only a single form in the document, just that description object is returned. This simplifies accessing the elements. If this is not desired, specify multi = TRUE explicitly to have a list returned.

If multi is given the value FALSE, the HTMLFormParser is used and a single object of class HTMLFormDescription is returned.

In either case, if the handlers are instructed to check for a dynamic form (checkDynamic), if any of the elements have a onChange attribute, the form is submitted with the different options for that element and a description of the possible values for all the elements that corresponding to these different settings is included in the result in the dynamicElements field. This allows us to handle simple dynamic forms whose possible element values (but not entire structure) change when one element's value is selected. See the species field in the wormbase form for a simple example.

Author(s)

Duncan Temple Lang <duncan@wald.ucdavis.edu>

See Also

multiFormElementHandlers formElementHandlers htmlTreeParse

Examples

if(require(RCurl) && require(XML)) {
   txt = getURLContent("http://www.google.com")
   doc = htmlParse(txt, asText = TRUE)
   f = getHTMLFormDescription(doc)
}

eq = getHTMLFormDescription("http://neic.usgs.gov/neis/epic/epic_global.html")

omegahat/RHTMLForms documentation built on Nov. 29, 2023, 12:36 a.m.