hu_read_html: Read HTML from a URL with Browser Emulation & in a JavaScript...
In hrbrmstr/htmlunit: Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

hu_read_html

R Documentation

Read HTML from a URL with Browser Emulation & in a JavaScript Context

Description

Use a JavaScript-enabled browser context to read and render HTML from a URL.

Usage

hu_read_html(
  url,
  emulate = c("best", "chrome", "firefox", "ie", "edge"),
  ret = c("html_document", "text"),
  js_delay = 2000L,
  timeout = 30000L,
  ignore_ssl_errors = TRUE,
  enable_dnt = FALSE,
  download_images = FALSE,
  options = c("RECOVER", "NOERROR", "NOBLANKS")
)

Arguments

`url`	URL to retrieve
`emulate`	browser to emulate; one of "`best`", "`chrome`", "`firefox`", "`ie`"
`ret`	what to return; if `html_document` (the default) then the HTML created by the `HtmlUnit` emulated browser context is passed to `xml2::read_html()` and an `xml2` `html_document`/`xml_document` is returned. Note that this causes further HTML processing by `xml2`/`libxml2` so is not exactly what `HtmlUnit` generated. If you want the HTML code (text) without any further processing then use `text` as the value.
`js_delay`	time (ms) to let loaded javascript to execute; default is 2 seconds (2000 ms)
`timeout`	overall timeout (ms); `0` == infinite wait (not recommended); note: the timeout is used twice: first in making the socket connection, second for data retrieval. If the time is critical you must allow for twice the time specified here. Default 30s (30000 ms)
`ignore_ssl_errors`	Should SSL/TLS errors be ignored. The default (`TRUE`) is a current hack due to how `HtmlUnit` seems to handle virtual hosted sites with multiple vhosts and multiple certificates. You can try it with `FALSE` initially and revert back to `TRUE` if you encounter issues.
`enable_dnt`	Enable the "Do Not Track" header. Default: `FALSE`.
`download_images`	Download images as the page is loaded? Since this function is a high-level wrapper designed to do a read of HTML, it is recommended that you leave this the default `FALSE` to save time/bandwidth.
`options`	options to pass to `xml2::read_html()` if `ret` == `html_document`.

Details

For the code in the examples, this is the site that is being scraped:

Figure: test-url-table.png

Note that it has a table of values but it is rendered via JavaScript.

Value

an xml2 html_document/xml_document if ret == html_document else the HTML document text generated by HtmlUnit.

Examples

## Not run: 
test_url <- "https://hrbrmstr.github.io/htmlunitjars/index.html"
hu_read_html(test_url)

## End(Not run)

hrbrmstr/htmlunit documentation built on July 4, 2025, 12:45 a.m.

hrbrmstr/htmlunit index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hrbrmstr/htmlunit
Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

hu_read_html: Read HTML from a URL with Browser Emulation & in a JavaScript...
In hrbrmstr/htmlunit: Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

Read HTML from a URL with Browser Emulation & in a JavaScript Context

Description

Usage

Arguments

Details

Value

Examples

Related to hu_read_html in hrbrmstr/htmlunit...

R Package Documentation

Browse R Packages

We want your feedback!

hrbrmstr/htmlunit Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

hu_read_html: Read HTML from a URL with Browser Emulation & in a JavaScript... In hrbrmstr/htmlunit: Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

Read HTML from a URL with Browser Emulation & in a JavaScript Context

Description

Usage

Arguments

Details

Value

Examples

Related to hu_read_html in hrbrmstr/htmlunit...

R Package Documentation

Browse R Packages

We want your feedback!

hrbrmstr/htmlunit
Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

hu_read_html: Read HTML from a URL with Browser Emulation & in a JavaScript...
In hrbrmstr/htmlunit: Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library