xml_extract: XML Extract.
In curtisalexander/CRAmisc: Curtis Miscellaneous

Description Usage Arguments Details Value See Also

Extract XML text or an XML attribute via XPath.

1 2	xml_extract(x, xpath, extract_type, extract_value = NULL, ret_var_name = NULL)

`x`	XML document: a literal XML document, a URL, or a string.
`xpath`	A string containing a xpath (1.0) expression.
`extract_type`	The string "text" or "attr" selecting the type of extraction.
`extract_value`	The attribute value to be extracted. This only needs to be set if `extract_type = "attr"`.
`ret_var_name`	The variable name of the extracted value. `xml_extract` may be used as part of an assignment statement. If this is the case, then the parameter `ret_var_name` should remain `NULL`. But if `xml_extract` is to be used as part of a functional pipeline then it may be necessary to name the returned value.

May be used as a UDF that is part of a dplyr pipeline. The most simple use is to include xml_extract as part of a dplyr::mutate function. For details see vignette("chunked-invoke-rows-xml").

A more complex use would be to use as a UDF to parse an arbitrary number of text and attribute values from an XML document. This can be accomplished utilizing a dataframe holding parameter values and purrr::pmap.

Because xml_find_first is the function utilized in xml_extract, errors are consumed. This is helpful when iterating over a set of XML documents where the schemas are inconsistent.

Suggested resources for XPath are

The extracted text or attribute value from an XML tag.

See vignette("chunked-invoke-rows") for usage.

curtisalexander/CRAmisc documentation built on May 14, 2019, 12:52 p.m.