read_xml: Read HTML or XML.

Description Usage Arguments Value Examples

View source: R/xml_parse.R

Description

Read HTML or XML.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
read_xml(x, encoding = "", ..., as_html = FALSE, options = "NOBLANKS")

read_html(x, encoding = "", ..., options = c("RECOVER", "NOERROR",
  "NOBLANKS"))

## S3 method for class 'character'
read_xml(x, encoding = "", ..., as_html = FALSE,
  options = "NOBLANKS")

## S3 method for class 'raw'
read_xml(x, encoding = "", base_url = "", ...,
  as_html = FALSE, options = "NOBLANKS")

## S3 method for class 'connection'
read_xml(x, encoding = "", n = 64 * 1024,
  verbose = FALSE, ..., base_url = "", as_html = FALSE,
  options = "NOBLANKS")

Arguments

x

A string, a connection, or a raw vector.

A string can be either a path, a url or literal xml. Urls will be converted into connections either using base::url or, if installed, curl::curl. Local paths ending in .gz, .bz2, .xz, .zip will be automatically uncompressed.

If a connection, the complete connection is read into a raw vector before being parsed.

encoding

Specify a default encoding for the document. Unless otherwise specified XML documents are assumed to be in UTF-8 or UTF-16. If the document is not UTF-8/16, and lacks an explicit encoding directive, this allows you to supply a default.

...

Additional arguments passed on to methods.

as_html

Optionally parse an xml file as if it's html.

options

Set parsing options for the libxml2 parser. Zero of more of \Sexpr[results=rd]{xml2:::describe_options(xml2:::xml_parse_options())}

base_url

When loading from a connection, raw vector or literal html/xml, this allows you to specify a base url for the document. Base urls are used to turn relative urls into absolute urls.

n

If file is a connection, the number of bytes to read per iteration. Defaults to 64kb.

verbose

When reading from a slow connection, this prints some output on every iteration so you know its working.

Value

An XML document. HTML is normalised to valid XML - this may not be exactly the same transformation performed by the browser, but it's a reasonable approximation.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Literal xml/html is useful for small examples
read_xml("<foo><bar /></foo>")
read_html("<html><title>Hi<title></html>")
read_html("<html><title>Hi")

# From a local path
read_html(system.file("extdata", "r-project.html", package = "xml2"))

# From a url
cd <- read_xml(xml2_example("cd_catalog.xml"))
me <- read_html("http://had.co.nz")

xml2 documentation built on Jan. 24, 2018, 5:21 p.m.