XML to xml2The R package XML for parsing and manipulation of XML documents in R is not actively maintained anymore, but used by many:
The R package xml2 is an actively maintained, more recent alternative.
This file documents useful resources and steps for moving from XML to xml2.
The itdepends package helps with finding all usages of XML, see https://speakerdeck.com/jimhester/it-depends?slide=38
devtools::install_github("jimhester/itdepends") library("itdepends") itdepends::dep_locate("XML")
| XML | xml2 | Comment |
|-------|--------|---------|
| XML::getNodeSet(doc = <document object>, path = "<XPath expression>") or XML::xpathApply(...) | xml2::xml_find_all(..) and xml2::xml_find_one(..) with x = <node>, xpath = "<XPath 1.0 expression>" | Find matching nodes value of a node's attribute |
| XML::htmlTreeParse(<path>, asText = <treat file as text>) | xml2::read_html(<path, URL, connection, or literal xml>) | parse HTML document |
| XML::isXMLString("<string>") | No direct equivalent, can try to parse... | Heuristically determine if string is XML |
| XML::toString.XMLNode(<node>) | as.character(<document or node>) | object to character |
| XML::xmlAttrs(node = <node object>) | xml2::xml_attrs(x = <document, node, or node set>) | Get the attributes of a node, both return a named character vector. |
| XML::xmlApply(X = <node>) and XML::xmlSApply(..) | functions xml2::xml_attrs(..) and xml2::xml_contents(..) are vectorized | Apply function to each child of a node |
| XML::xmlChildren(x = <node object>)[["<name of the sub-node>"]] | xml2::xml_child(x = <node>, search = <number, or name of the sub-node>) (only elements) and xml2::xml_contents(..) for all nodes | Get sub-nodes of a node |
| XML::xmlElementsByTagName(el = <node object>, name = "<name to match>") | xml2::xml_find_all(x = <document, node, node set>, xpath = "<name to match>") | Retrieve children matching tag name (children/sub-elements) |
| XML::xmlGetAttr(node = <node object>, name = "<attribute name>", default = "<default>") | xml2::xml_attr(x = <document, node, or node set>, attr = "<attribute name>") | Get value of a node's attribute |
| XML::xmlName(node = <node object>) | xml2::xml_name(x = <document, node, or node set>) | Get name of a node |
| XML::xmlParse(..) | xml2::read_xml(..) | Unexposed method in XML ? |
| XML::xmlParseDoc(file = <file name> or "<xml content>", asText = !file.exists(file)) | xml2::read_xml(x = <string, connection, URL, or raw vector>) | parse XML document |
| XML::xmlParseString(content = "<string>") | xml2::read_xml(x = <string, connection, URL, or raw vector>) | convenience function XML to node/tree |
| XML::xmlRoot(x = <node object>) | xml2::xml_root(x = <document, node, or node set> | Get top-level node |
| XML::xmlSize(obj = <node or document object>) | xml2::xml_length() | Note that xml_length(..) does not need to go to the root first, i.e. XML::xmlSize(XML::xmlRoot(old)) == xml2::xml_length(new) |
| XML::xmlToList(node = <xml node or document>) | xml2::as_list(x = <document, node, or node set>) | convert to R-like list; difference: as_list does not drop the root element |
| XML::xmlTreeParse(file = <file name> or "<xml content>", asText = !file.exists(file)) | | parse XML document |
| if(!is.null(<node object>[["<child name>"]])) { | (inherits(xml_child(<node object>, "<child name>"), "xml_missing") | Checking for child node existence |
| XML::xmlValue(<node object>) | xml2::xml_text(x = <document, node, or node set>) | Get/Set contents of a leaf node |
Common snippets
| XML | xml2 | Comment |
|-------|--------|---------|
| if (!is.null(XML::xmlChildren(x = obj)[[<node name>]])) | if (!inherits(xml2::xml_find_first(x = obj, xpath = <node name>), "xml_missing") | Check if element exists. |
| if(!is.null(XML::xmlAttrs(node = obj)[["href"]])) | if(!is.na(xml2::xml_attr(x = obj, attr = "href"))) | Checking for potentiall non-existing attribute |
| XML | xml2 | Comment |
|-------|--------|---------|
| XML::addAttributes(node = <node object>, ..., .attrs = <character vector with attribute names>, append = <replace or add>) | xml2::xml_set_attrs(x = <document, node, node set>, value = <named character vector>) to set multiple attributes and overwrite existing ones, or xml2::xml_set_attr(x = <node>, attr = <name>, value = <value>) to append a single attribute | Add attributes to a node; in xml2 no re-assigning the object is needed, i.e. no doc <- XML::addAttributes(node = doc, ...) |
| XML::addChildren(node = <node object>, kids = list()) | xml2::xml_add_child(.x = <document or nodeset>, .value = <document, node or nodeset>) | Add child nodes to a node |
| XML::saveXML(doc = <xml document object>, file = "<file name>") | xml2::write_xml(x = <document or node>, file = "<path or connection">) | Write XML document to string or file |
| XML::xmlNamespaceDefinitions(x = <node>) | xml2::xml_ns(x = <document, node, or node set>) | Get namespace definitions from a node |
| XML::xmlNode(name = "<node name>") | xml2::xml_new_document %>% xml2::xml_add_child("<node name>") or (preferred in docs) xml2::xml_new_root("<node name>") | Create a new node |
| XML::xmlValue() | xml2::xml_text(x = <document, node, or node set>) | Get/Set contents of a leaf node |
| XML | xml2 | Comment |
|-------|--------|---------|
| XMLAbstractDocument | xml_document | .. |
| XMLAbstractNode, XMLCommentNode, XMLTextNode, ... | xml_node | .. |
| ? | xml_missing | .. |
The following steps were applied in switching from XML to xml2 for the package sos4R.
This is not a "clean" process, but hopefully provides useful input for other's doing the switch.
Ideally the lessons learned on what can be "regex-ed" and what needs manual interaction go into the above tables at a later stage.
addAttributes\((?!node) replaced with XML::addAttributes(node =addChildren\(node replaced with XML::addChildren(nodegetNodeSet\((?!doc) replaced with XML::getNodeSet(doc =isXMLString\((?!str) replaced with XML::isXMLString(str =saveXML\((?!doc) replaced with XML::saveXML(doc =xmlAttrs\((?!node) replaced with XML::xmlAttrs(node =xmlChildren\((?!x) replaced with XML::xmlChildren(x =xmlElementsByTagName replaced with XML::xmlElementsByTagNamexmlGetAttr\((?!node) replaced with XML::xmlGetAttr(node =xmlName\((?!node) replaced with XML::xmlName(node =xmlNode\((?!name) and xmlNode\(name = replaced with XML::xmlNode(name =xmlParse\( replaced with XML::xmlParse(file =xmlParseDoc\((?!file) replaced with XML::xmlParseDoc(file =xmlParseString\( replaced with XML::xmlParseString(content =xmlRoot\((?!x) replaced with XML::xmlRoot(x =xmlSize\( replaced with XML::xmlSize(obj =xmlToList\( replaced with XML::xmlToList(node =xmlTreeParse\( replaced with XML::xmlTreeParse(file =xmlValue\((?!x) replaced with XML::xmlValue(x =Imports: XML instead of Depends:*.R, files in /sandbox/ ignored for manual corrections; order driven by running a basic parsing test and see where it fails next)XML::xmlParseDocXML::xmlParseDoc(file = with xml2::read_xml(x = (26 occurrences), asText = TRUE by replacing it with `` (blank, 11 occurrences)options into vector with stringsc(XML::NOERROR, XML::RECOVER) with SosDefaultParsingOptions()xmlParseOptions everywhereXML::xmlParseStringencodeXML for signature "character"XML::xmlParseparseFileXML::xmlRootXML::xmlRoot with xml2::xml_root (25 occurrences)XML::xmlNameXML::xmlName(node = with xml2::xml_name(x = (30 occurrences), ns = SosAllNamespaces() later to have names with prefixXML::xmlAttrsXML::xmlAttrs(node = with xml2::xml_attrs(x = (3 occurrences)xmlAttrs (must have slipped by before)xml2::xml_attrs(x = obj)[["href"]] does not work because if attribute href does not exist there will be a "subscript out of bounds" error. Need to use xml2::xml_attrs\(x = (.*)\[\[ and fix manually to xml2::xml_attrs(x = obj, attr = "<attribute name>") and update subsequent is.null(..) checks to use is.na(..)XML::xmlGetAttrXML::xmlGetAttr\(node = (.*), name = with xml2::xml_attr(x = $1, attr = (55 occurrences)name =, can also fix indentation then or remove newlinexmlGetAttr was used withn lapply(..) or sapply(..)XML::xmlValueXML::xmlValue\(x = with xml2::xml_text(x = (45 occurrences)XML::xmlChildrenXML::xmlChildren\(x = with xml2::xml_children(x = (22 occurrences)XML::xmlChildren(x = obj)[[gmlTimeInstantName]] does not work because xml2::xml_children(..) does not return a named list. Need to run xml2::xml_find_all(x = obj, xpath = gmlTimeInstant) or xml2::xml_find_first(..) then. Search for xml2::xml_children\(x = (.*)\[\[ to fix those manually (10 results)..find_first returns missing node: is.na(xml2::xml_find_first(x, "f")) or inherits(xml2::xml_find_first(x, "f"), "xml_missing")..find_all returns (potentially empty) nodeset: length(xml2::xml_find_all(x, "f"))XMLAbstractNode and XMLInternalDocument for slots in S4 classes with ANY and the default prototype to xml2::xml_missing(), will have to handle stuff manually around these classesxml2 repo: https://github.com/r-lib/xml2/issues/248SosAllNamespaces() and add namespaces to all the xxxName constants in R/Constants.Rtest_exceptionreports.R completetest_sams.R added and parsing fixedXML::getNodeSet manually switched to xml2::xml_find_all(..) and xml2::xml_find_one(..), because XPath-based getting of sub-nodes with xml2 also requires proper namespaces and some handling can be simplified because of vectorised xml2::xml_text(..).XML::xmlSizeXML::saveXMLXML::saveXML(doc = with xml2::write_xml(x = (6 occurrences), no parameters in saveXML besides doc and file were usedNAMESPACE to import xml2 and not XMLtest_sensors.R workXML::isXMLStringgrepl("^<(.*)>$", "...").filterXmlChildren and .filterXmlOnlyNoneTexts manually using xml2::xml_child(..), xml2::xml_find_first(..) or xml2::xml_find_all(..)
obj[[ because subsetting with [[ does not work with XML (107 occurrences at this point!)obj\[\[(.*?)\]\] with xml2::xml_child(x = obj, search = $1, ns = SosAllNamespaces())obj[[..]] was used (file PrintShowStructureSummary-methods.R)obj[["elementCount"]][["Count"]][["value"]] > search for SosAllNamespaces())[[ and fix manually to use XPath (4 occurrences).children[[is.null\(\. with some XML object, should be is.na(..) which picks up on "xml_missing" objectsparseOwsRangeparseSosFilter_CapabilitiesparseOwsServiceIdentificationparseTimeparseSosObservationOffering (also for 2.0.0)test_sensors.RXML::addAttributes.attrs is used, which is replaced with xml2::xml_set_attrs(), and sometimes not (single ...), which is replaced with xml2::xml_set_attr(), the _set_attr variants operate directly on the object (no need to re-assign), and often statements are multi-line (18 occurrences).sos100_NamespaceDefinitionsForAllXML::xmlNode and XML::addChildrenxml2::xml_new_root("<node name>") and xml2::xml_add_child("<node name>")attrs parameter replaced with xml2::xml_set_attrs()XML::addChildren with "append = TRUE" replace with a for loop and xml2::xml_add_child(..)Limitations of regexes for the actual switch are due to multi-line statements and the result of functions not being the same.
Especially the subsetting with [[ used extensively does not work the same way anymore.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.