xmlToList: Convert an XML node/document to a more R-like list

View source: R/tree.R

xmlToListR Documentation

Convert an XML node/document to a more R-like list

Description

This function is an early and simple approach to converting an XML node or document into a more typical R list containing the data values directly (rather than as XML nodes). It is useful for dealing with data that is returned from REST requests or other Web queries or generally when parsing XML and wanting to be able to access the content as elements in a list indexed by the name of the node. For example, if given a node of the form <x> <a>text</a> <b foo="1"/> <c bar="me"> <d>a phrase</d> </c> </x> We would end up with a list with elements named "a", "b" and "c". "a" would be the string "text", b would contain the named character vector c(foo = "1") (i.e. the attributes) and "c" would contain the list with two elements named "d" and ".attrs". The element corresponding to "d" is a character vector with the single element “a phrase”. The .attrs element of the list is the character vector of attributes from the node <c>...</c>.

Usage

xmlToList(node, addAttributes = TRUE, simplify = FALSE,
          attributesAsElements = FALSE)

Arguments

node

the XML node or document to be converted to an R list. This can be an “internal” or C-level node (i.e. XMLInternalNode-class) or a regular R-level node (either XMLNode-class or XMLHashNode).

addAttributes

a logical value which controls whether the attributes of a node are added to the list as a character vector element named .attrs, or set as R attributes on the resulting list. In other words, suppose we have a node of the form <foo a="1" b="xyz" c="true"><f>123</f><g>2.2</g></foo>, addAttributes = TRUE will result in a list with elements named f, g and .attrs with the last of these being a named character vector c(a = "1", b = "xyz", c = "true"). If addAttributes = FALSE, we get a list with two elements (corresponding to the subnodes f and g) and the attributes named a, b and c awith the string values from the node attributes of foo are added as R attributes to the 2-element (f and g) list.

attributesAsElements

if TRUE and also addAttributes is TRUE, add each attribute of the XML node as a separate element of the resulting list rather than having them in a single character vector in an element named .attr. In other words, in the example in addAttributes, the result would be a list with elements named f, g, a, b and c.

simplify

a logical value that controls whether we collapse the list to a vector if the elements all have a common compatible type. Basically, this controls whether we use sapply or lapply.

Value

A list whose elements correspond to the children of the top-level nodes.

Author(s)

Duncan Temple Lang

See Also

xmlTreeParse getNodeSet and xpathApply xmlRoot, xmlChildren, xmlApply, [[, etc. for accessing the content of XML nodes.

Examples

tt =
 '<x>
     <a>text</a>
     <b foo="1"/>
     <c bar="me">
        <d>a phrase</d>
     </c>
  </x>'

  doc = xmlParse(tt)
  xmlToList(doc)

   # use an R-level node representation
  doc = xmlTreeParse(tt)
  xmlToList(doc)


   # add the attributes a and b from foo as attributes to the list
  v1 = xmlToList('<foo a="1" b="xyz"><f>123</f><g>2.2</g></foo>', FALSE)

   # add the attributes a and b as a character vector to the resulting list in an
   # element named .attrs
  v2 = xmlToList('<foo a="1" b="xyz"><f>123</f><g>2.2</g></foo>', TRUE)

   # add a and b as regular elements to the resulting list to get
   #  f, g, a and b
  v3 = xmlToList('<foo a="1" b="xyz"><f>123</f><g>2.2</g></foo>', TRUE, attributesAsElements = TRUE)

omegahat/XML documentation built on Jan. 17, 2024, 6:47 p.m.