Reads an XML document into a flat dataframe structure.
Path to the XML document. Can be either a local path or a URL.
The XML document is parsed and stored in a dataframe structure (flat XML). The first four columns of a flat XML dataframe are standard columns. Their names all end with a dot. These columns are:
elem.: The element identifier of the current XML element (without the tag delimiters
elemid.: A unique, ascending numerical ID for each XML element. The first XML element is assigned 1 as its ID. This ID is used by many of the
attr.: Name of an attribute. For each attribute of an XML element the dataframe will have an additional row.
value.: The value of either the attribute (if
attr. is not
NA) or the element itself (if
NA, if the element has no value.
The columns after these four standard columns represent the 'path' to the current element, starting from the root element of the XML document in column 5 all
the way down to the current element. The number of columns of the dataframe is therefore determined by the depth of the hierarchical structure of the XML document.
In this dataframe representation, the hierarchical structure of the XML document becomes very easy to understand. All
flatxml functions work with this flat XML dataframe.
If an XML element has N attributes it is represented by (N+1) rows in the flat XML dataframe: one row for the value (with
NA if the element has no value)
and one for each attribute. In the attribute rows, the names of the attributes are stored in the
attr. field, their respecitive values in the
value. field. Even if there are multiple rows
for one XML element, the
elemid. fields still have the same value in all rows (because the rows belong to the same XML element).
A dataframe containing the XML document in a flat structure. See the Details section for more information on its structure.
Joachim Zuckarelli [email protected]
1 2 3 4
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.