XML
to xml2
The R package XML
for parsing and manipulation of XML documents in R is not actively maintained anymore, but used by many:
The R package xml2
is an actively maintained, more recent alternative.
This file documents useful resources and steps for moving from XML
to xml2
.
The itdepends
package helps with finding all usages of XML, see https://speakerdeck.com/jimhester/it-depends?slide=38
devtools::install_github("jimhester/itdepends") library("itdepends") itdepends::dep_locate("XML")
| XML
| xml2
| Comment |
|-------|--------|---------|
| XML::getNodeSet(doc = <document object>, path = "<XPath expression>")
or XML::xpathApply(...)
| xml2::xml_find_all(..)
and xml2::xml_find_one(..)
with x = <node>, xpath = "<XPath 1.0 expression>"
| Find matching nodes value of a node's attribute |
| XML::htmlTreeParse(<path>, asText = <treat file as text>)
| xml2::read_html(<path, URL, connection, or literal xml>)
| parse HTML document |
| XML::isXMLString("<string>")
| No direct equivalent, can try to parse... | Heuristically determine if string is XML |
| XML::toString.XMLNode(<node>)
| as.character(<document or node>)
| object to character |
| XML::xmlAttrs(node = <node object>)
| xml2::xml_attrs(x = <document, node, or node set>)
| Get the attributes of a node, both return a named character vector. |
| XML::xmlApply(X = <node>)
and XML::xmlSApply(..)
| functions xml2::xml_attrs(..)
and xml2::xml_contents(..)
are vectorized | Apply function to each child of a node |
| XML::xmlChildren(x = <node object>)[["<name of the sub-node>"]]
| xml2::xml_child(x = <node>, search = <number, or name of the sub-node>)
(only elements) and xml2::xml_contents(..)
for all nodes | Get sub-nodes of a node |
| XML::xmlElementsByTagName(el = <node object>, name = "<name to match>")
| xml2::xml_find_all(x = <document, node, node set>, xpath = "<name to match>")
| Retrieve children matching tag name (children/sub-elements) |
| XML::xmlGetAttr(node = <node object>, name = "<attribute name>", default = "<default>")
| xml2::xml_attr(x = <document, node, or node set>, attr = "<attribute name>")
| Get value of a node's attribute |
| XML::xmlName(node = <node object>)
| xml2::xml_name(x = <document, node, or node set>)
| Get name of a node |
| XML::xmlParse(..)
| xml2::read_xml(..)
| Unexposed method in XML
? |
| XML::xmlParseDoc(file = <file name> or "<xml content>", asText = !file.exists(file))
| xml2::read_xml(x = <string, connection, URL, or raw vector>)
| parse XML document |
| XML::xmlParseString(content = "<string>")
| xml2::read_xml(x = <string, connection, URL, or raw vector>)
| convenience function XML to node/tree |
| XML::xmlRoot(x = <node object>)
| xml2::xml_root(x = <document, node, or node set>
| Get top-level node |
| XML::xmlSize(obj = <node or document object>)
| xml2::xml_length()
| Note that xml_length(..)
does not need to go to the root first, i.e. XML::xmlSize(XML::xmlRoot(old)) == xml2::xml_length(new)
|
| XML::xmlToList(node = <xml node or document>)
| xml2::as_list(x = <document, node, or node set>)
| convert to R-like list; difference: as_list
does not drop the root element |
| XML::xmlTreeParse(file = <file name> or "<xml content>", asText = !file.exists(file))
| | parse XML document |
| if(!is.null(<node object>[["<child name>"]])) {
| (inherits(xml_child(<node object>, "<child name>"), "xml_missing")
| Checking for child node existence |
| XML::xmlValue(<node object>)
| xml2::xml_text(x = <document, node, or node set>)
| Get/Set contents of a leaf node |
Common snippets
| XML
| xml2
| Comment |
|-------|--------|---------|
| if (!is.null(XML::xmlChildren(x = obj)[[<node name>]]))
| if (!inherits(xml2::xml_find_first(x = obj, xpath = <node name>), "xml_missing")
| Check if element exists. |
| if(!is.null(XML::xmlAttrs(node = obj)[["href"]]))
| if(!is.na(xml2::xml_attr(x = obj, attr = "href")))
| Checking for potentiall non-existing attribute |
| XML
| xml2
| Comment |
|-------|--------|---------|
| XML::addAttributes(node = <node object>, ..., .attrs = <character vector with attribute names>, append = <replace or add>)
| xml2::xml_set_attrs(x = <document, node, node set>, value = <named character vector>)
to set multiple attributes and overwrite existing ones, or xml2::xml_set_attr(x = <node>, attr = <name>, value = <value>)
to append a single attribute | Add attributes to a node; in xml2
no re-assigning the object is needed, i.e. no doc <- XML::addAttributes(node = doc, ...)
|
| XML::addChildren(node = <node object>, kids = list())
| xml2::xml_add_child(.x = <document or nodeset>, .value = <document, node or nodeset>)
| Add child nodes to a node |
| XML::saveXML(doc = <xml document object>, file = "<file name>")
| xml2::write_xml(x = <document or node>, file = "<path or connection">)
| Write XML document to string or file |
| XML::xmlNamespaceDefinitions(x = <node>)
| xml2::xml_ns(x = <document, node, or node set>)
| Get namespace definitions from a node |
| XML::xmlNode(name = "<node name>")
| xml2::xml_new_document %>% xml2::xml_add_child("<node name>")
or (preferred in docs) xml2::xml_new_root("<node name>")
| Create a new node |
| XML::xmlValue()
| xml2::xml_text(x = <document, node, or node set>)
| Get/Set contents of a leaf node |
| XML
| xml2
| Comment |
|-------|--------|---------|
| XMLAbstractDocument
| xml_document
| .. |
| XMLAbstractNode
, XMLCommentNode
, XMLTextNode
, ... | xml_node
| .. |
| ? | xml_missing
| .. |
The following steps were applied in switching from XML
to xml2
for the package sos4R
.
This is not a "clean" process, but hopefully provides useful input for other's doing the switch.
Ideally the lessons learned on what can be "regex-ed" and what needs manual interaction go into the above tables at a later stage.
addAttributes\((?!node)
replaced with XML::addAttributes(node =
addChildren\(node
replaced with XML::addChildren(node
getNodeSet\((?!doc)
replaced with XML::getNodeSet(doc =
isXMLString\((?!str)
replaced with XML::isXMLString(str =
saveXML\((?!doc)
replaced with XML::saveXML(doc =
xmlAttrs\((?!node)
replaced with XML::xmlAttrs(node =
xmlChildren\((?!x)
replaced with XML::xmlChildren(x =
xmlElementsByTagName
replaced with XML::xmlElementsByTagName
xmlGetAttr\((?!node)
replaced with XML::xmlGetAttr(node =
xmlName\((?!node)
replaced with XML::xmlName(node =
xmlNode\((?!name)
and xmlNode\(name =
replaced with XML::xmlNode(name =
xmlParse\(
replaced with XML::xmlParse(file =
xmlParseDoc\((?!file)
replaced with XML::xmlParseDoc(file =
xmlParseString\(
replaced with XML::xmlParseString(content =
xmlRoot\((?!x)
replaced with XML::xmlRoot(x =
xmlSize\(
replaced with XML::xmlSize(obj =
xmlToList\(
replaced with XML::xmlToList(node =
xmlTreeParse\(
replaced with XML::xmlTreeParse(file =
xmlValue\((?!x)
replaced with XML::xmlValue(x =
Imports:
XML instead of Depends:
*.R
, files in /sandbox/
ignored for manual corrections; order driven by running a basic parsing test and see where it fails next)XML::xmlParseDoc
XML::xmlParseDoc(file =
with xml2::read_xml(x =
(26 occurrences), asText = TRUE
by replacing it with `` (blank, 11 occurrences)options
into vector with stringsc(XML::NOERROR, XML::RECOVER)
with SosDefaultParsingOptions()
xmlParseOptions
everywhereXML::xmlParseString
encodeXML
for signature "character"
XML::xmlParse
parseFile
XML::xmlRoot
XML::xmlRoot
with xml2::xml_root
(25 occurrences)XML::xmlName
XML::xmlName(node =
with xml2::xml_name(x =
(30 occurrences), ns = SosAllNamespaces()
later to have names with prefixXML::xmlAttrs
XML::xmlAttrs(node =
with xml2::xml_attrs(x =
(3 occurrences)xmlAttrs
(must have slipped by before)xml2::xml_attrs(x = obj)[["href"]]
does not work because if attribute href does not exist there will be a "subscript out of bounds" error. Need to use xml2::xml_attrs\(x = (.*)\[\[
and fix manually to xml2::xml_attrs(x = obj, attr = "<attribute name>")
and update subsequent is.null(..)
checks to use is.na(..)
XML::xmlGetAttr
XML::xmlGetAttr\(node = (.*), name =
with xml2::xml_attr(x = $1, attr =
(55 occurrences)name =
, can also fix indentation then or remove newlinexmlGetAttr
was used withn lapply(..)
or sapply(..)
XML::xmlValue
XML::xmlValue\(x =
with xml2::xml_text(x =
(45 occurrences)XML::xmlChildren
XML::xmlChildren\(x =
with xml2::xml_children(x =
(22 occurrences)XML::xmlChildren(x = obj)[[gmlTimeInstantName]]
does not work because xml2::xml_children(..)
does not return a named list. Need to run xml2::xml_find_all(x = obj, xpath = gmlTimeInstant)
or xml2::xml_find_first(..)
then. Search for xml2::xml_children\(x = (.*)\[\[
to fix those manually (10 results)..find_first
returns missing node: is.na(xml2::xml_find_first(x, "f"))
or inherits(xml2::xml_find_first(x, "f"), "xml_missing")
..find_all
returns (potentially empty) nodeset: length(xml2::xml_find_all(x, "f"))
XMLAbstractNode
and XMLInternalDocument
for slots in S4 classes with ANY
and the default prototype to xml2::xml_missing()
, will have to handle stuff manually around these classesxml2
repo: https://github.com/r-lib/xml2/issues/248SosAllNamespaces()
and add namespaces to all the xxxName
constants in R/Constants.R
test_exceptionreports.R
completetest_sams.R
added and parsing fixedXML::getNodeSet
manually switched to xml2::xml_find_all(..)
and xml2::xml_find_one(..)
, because XPath-based getting of sub-nodes with xml2
also requires proper namespaces and some handling can be simplified because of vectorised xml2::xml_text(..)
.XML::xmlSize
XML::saveXML
XML::saveXML(doc =
with xml2::write_xml(x =
(6 occurrences), no parameters in saveXML
besides doc
and file
were usedNAMESPACE
to import xml2
and not XML
test_sensors.R
workXML::isXMLString
grepl("^<(.*)>$", "...")
.filterXmlChildren
and .filterXmlOnlyNoneTexts
manually using xml2::xml_child(..)
, xml2::xml_find_first(..)
or xml2::xml_find_all(..)
obj[[
because subsetting with [[
does not work with XML (107 occurrences at this point!)obj\[\[(.*?)\]\]
with xml2::xml_child(x = obj, search = $1, ns = SosAllNamespaces())
obj[[..]]
was used (file PrintShowStructureSummary-methods.R
)obj[["elementCount"]][["Count"]][["value"]]
> search for SosAllNamespaces())[[
and fix manually to use XPath (4 occurrences).children[[
is.null\(\.
with some XML object, should be is.na(..)
which picks up on "xml_missing"
objectsparseOwsRange
parseSosFilter_Capabilities
parseOwsServiceIdentification
parseTime
parseSosObservationOffering
(also for 2.0.0)test_sensors.R
XML::addAttributes
.attrs
is used, which is replaced with xml2::xml_set_attrs()
, and sometimes not (single ...
), which is replaced with xml2::xml_set_attr()
, the _set_attr
variants operate directly on the object (no need to re-assign), and often statements are multi-line (18 occurrences).sos100_NamespaceDefinitionsForAll
XML::xmlNode
and XML::addChildren
xml2::xml_new_root("<node name>")
and xml2::xml_add_child("<node name>")
attrs
parameter replaced with xml2::xml_set_attrs()
XML::addChildren
with "append = TRUE"
replace with a for loop and xml2::xml_add_child(..)
Limitations of regexes for the actual switch are due to multi-line statements and the result of functions not being the same.
Especially the subsetting with [[
used extensively does not work the same way anymore.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.