formElementHandlers: Gather information from HTML form elements

View source: R/elementFetch.R

formElementHandlersR Documentation

Gather information from HTML form elements

Description

These functions are used when parsing HTML pages containing forms to gather a description of the individual forms. The idea is to read the individual elements within an HTML form and provide their details in terms of

  • 1the element name

  • 2the default value

  • 3the set of possible values

  • 4whether it is visible, i.e. settable by the user or simply a hidden field

With this information, we can automate the access to the HTML form via an S function that provides the same user specification of options but without the manual operation and without the management of the returned data, e.g. saving it as a file and bringing it into S.

We ignore JavaScript-related operations.

multiFormElementHandlers can handle multiple forms within a single page. formElementHandlers accumulates all the form elements into a single structure and does not observe multiple form boundaries. If you have an HTML form with potentially more than one form, use multiFormElementHandlers. This is hidden from most users via the function getHTMLFormDescription.

Usage

formElementHandlers(url = NULL, checkDynamic = TRUE, dropButtons = TRUE)
multiFormElementHandlers(url = NULL, checkDynamic = TRUE, dropButtons = TRUE)

Arguments

url

the URL of the HTML page. This is not necessary for creating the description of the form elements as this is done via a call to htmlTreeParse, but it is used to provide a fully self-describing description of the form.

checkDynamic

a logical value indicating whether to test whether the form has dynamic elements. If this is TRUE, when the description of the form is complete, we call checkDynamicForm which processes dynamic elements (i.e. those with an onChange attribute) by submitting the form with different values for that element in order to construct the collection of different possible values for all elements that are accepted/possible for the different possible inputs for the dynamic element(s).

dropButtons

a logical value indicating whether to omit button elements in the form description. These are typically Submit or Reset buttons that are not relevant for submitting the form request from R.

Details

This uses the htmlTreeParse function in the XML parsing package to gather up and process the different HTML form elements in the HTML document. It organizes the information into a more programmatically accessible structure.

Value

An object of class HTMLFormDescription.

inputs

a list describing the different select elements. Eac element corresponds to a separate select element and is a named character vector. The values in the character vector are the text for the option elements and the names are the corresponding value attribute which is submitted if that option is selected.

textareas

the names of the TEXT or TEXTAREA elements.

fixed
form

the attributes (a named character vector) giving the HTML attributes associated with the FORM element. These describe the action, the URI for submission, the encoding format, etc.

url

this is supplied when the handlers are created and allows the complete information about the form(s) to be entirely self-describing, i.e. to resolve relative links, etc. for the POST actions.

hidden

a list containing character vectors of length 1 or more. Each element in the list corresponds to an HTML element of type "hidden" with a name. Such elements can have multiple values for the same name, i.e. the name="x" can be repeated and all these values must be sent as part of the form.

inputdefaults
textareadefaults
selectdefaults

...

Note

Currently, we organize the information from a form into a simple HTMLFormDescription object which is an S3-style class. This maintains the information about the form in separate fields and one must look across these fields to understand an individual element. For example, one would get its

Author(s)

Duncan Temple Lang <duncan@wald.ucdavis.edu>


omegahat/RHTMLForms documentation built on Nov. 29, 2023, 12:36 a.m.