XML2Obs: Parse XML files into a list of "observations"

Description Usage Arguments Details Value See Also Examples

View source: R/XML2R.R

Description

This function takes a collection of urls that point to XML files and coerces the relevant info into a list of observations. An "observation" is defined as a matrix with one row. An observation can also be thought of as a single instance of XML attributes (and value) for a particular level in the XML hierarchy. The names of the list reflect the XML node ancestory for which each observation was extracted from.

Usage

1
2
XML2Obs(urls, xpath, append.value = TRUE, as.equiv = TRUE,
  url.map = FALSE, local = FALSE, quiet = FALSE, ...)

Arguments

urls

character vector. Either urls that point to an XML file online or a local XML file name.

xpath

XML XPath expression that is passed to getNodeSet. If missing, the entire root and all descendents are captured and returned (ie, tables = "/").

append.value

logical. Should the XML value be appended for relevant observations?

as.equiv

logical. Should observations from two different files (but the same ancestory) have the same name returned?

url.map

logical. If TRUE, the 'url_key' column will contain a condensed url identifier (for each observation) and full urls will be stored in the "url_map" element. If FALSE, the full urls are included (for each observation) as a 'url' column and no "url_map" is included.

local

logical. Should urls be treated as paths to local files?

quiet

logical. Print file name currently being parsed?

...

arguments passed along to httr::GET

Details

It's worth noting that a "url_key" column is appended to each observation to help track the origin of each observation. The value of the "url_key" column is not the actual file name, but a simplified identifier to avoid unnecessarily repeating long file names for each observation. For this reason, an addition element (named "url_map") is added to the list of observations in case the actual file named want to be used.

Value

A list of "observations" and (possibly) the "url_map" element.

See Also

urlsToDocs, docsToNodes, nodesToList, listsToObs

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
urls <- c("http://gd2.mlb.com/components/game/mlb/year_2013/mobile/346180.xml",
           "http://gd2.mlb.com/components/game/mlb/year_2013/mobile/346188.xml")
obs <- XML2Obs(urls)
table(names(obs))

# parses local files as well
players <- system.file("extdata", "players.xml", package = "XML2R")
obs2 <- XML2Obs(players, local = TRUE)
table(names(obs2))

## End(Not run)

cpsievert/XML2R documentation built on May 13, 2019, 10:54 p.m.