fxml_toDataFrame: Extracting data from an XML document into a dataframe

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/flatxml.r

Description

Reads in data from an XML document and returns a dataframe.

Usage

1
2
3
fxml_toDataFrame(xmlflat.df, siblings.of, same.tag = TRUE, attr.only = NULL,
  attr.not = NULL, elem.or.attr = "elem", col.attr = "",
  include.fields = NULL, exclude.fields = NULL)

Arguments

xmlflat.df

A flat XML dataframe created with fxml_importXMLFlat.

siblings.of

ID of one of the XML elements that contain the data records. All data records need to be on the same hierarchical level as the XML element with this ID.

same.tag

If TRUE, only elements of the same type (xmlflat.df$elem.) as the element sibling.of are considered as data records. If FALSE, all elements on the same hierarchical level as the element sibling.of are considered to be data records.

attr.only

A list of named vectors representing attribute/value combinations the data records must match. The name of an element in the list is the XML element name to which the attribute belongs. The list element itself is a named vector. The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements). Example: attr.only = list(tag1 = c(attrib1 = "Value 1", attrib2 = "Value 2"), tag2 = c(attrib3 = "Value 3")) will only include tag1 elements of the form <tag1 attrib1 = "Value 1" attrib2 = "Value 2"> and tag2 elements of the form <tag2 attrib3 = "Value 3"> as data records.

attr.not

A list of vectors representing attribute/value combinations the XML elements must not match to be considered as data records. See argument attr.only for details.

elem.or.attr

Either "elem" or "attr". Defines, if the names of the record fields (columns in the dataframe) are represented by the names (tags) of the respective XML elements (the children of the elements on the same level as siblings.of) ("elem") or if the field names are given by some attribute of those tags ("attr").

col.attr

If elem.or.attr is "attr" then col.attr specifies the name of the attribute that gives the record field / column names.

include.fields

A character vector with the names of the fields that are to be included in the result dataframe. By default, all fields from the XML document are included.

exclude.fields

A character vector with the names of the fields that should be excluded in the result dataframe. By default, no fields from the XML document are excluded.

Details

Data that can be read in are either represented in this way:

<record>
<field1>Value of field1</field1>
<field2>Value of field2</field2>
<field3>Value of field3</field3>
</record>
...

In this case elem.or.attr would need to be "elem" because the field names of the data records (field1, field2, field3) are the names of the elements.

Or, the XML data could also look like this:

<record>
<column name="field1">Value of field1</column>
<column name="field2">Value of field2</column>
<column name="field3">Value of field3</column>
</record>
...

Here, the names of the fields are attributes, so elem.or.attr would need to be "attr" and col.attr would be set to "name", so fxml_toDataframe() knows where to look for the field/column names.

In any case, siblings.of would be the ID (xmlflat.df$elemid.) of one of the <record> elements.

Value

A dataframe with the data read in from the XML document.

Author(s)

Joachim Zuckarelli [email protected]

See Also

fxml_importXMLFlat

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Load example file with population data from United Nations Statistics Division
# and create flat dataframe
example <- system.file("worldpopulation.xml", package="flatxml")
xml.dataframe <- fxml_importXMLFlat(example)

# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. ==  3).
# The field names are given in the "name" attribute of the children elements of element no. 3
# and its siblings
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name")
# Exclude the "Value Footnote" field from the returned dataframe
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name", exclude.fields=c("Value Footnote"))


# Load example file with soccer world cup data (data from
# https://www.fifa.com/fifa-tournaments/statistics-and-records/worldcup/index.html)
# and create flat dataframe
example2 <- system.file("soccer.xml", package="flatxml")
xml.dataframe2 <- fxml_importXMLFlat(example2)

# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. ==  3). #' # The field names are given as the name
# of the children elements of element no. 3 and its siblings.
worldcups.df <- fxml_toDataFrame(xml.dataframe2, siblings.of=3, elem.or.attr="elem")

jsugarelli/flatxml documentation built on Aug. 7, 2018, 3:31 p.m.