tidyHTML: Try to "Tidy" Untidy HTML Pages

Description Usage Arguments Value Note Author(s) References Examples

View source: R/tidyHTML.R

Description

Sometimes, web pages need a little "HTML Tidy" treatment before they can be successfully used by the parsers in the XML package. This function tries to tidy them using the online web service for HTML Tidy before parsing it.

Usage

1
tidyHTML(URL)

Arguments

URL

The problematic URL

Value

A parsed URL, ready to be used with readHTMLTable from the XML package.

Note

Still no guarantee it will work! :-)

Author(s)

Ananda Mahto

References

http://stackoverflow.com/a/12761741/1270695

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
## Can't find an actual example. The URL from the
##   question is no longer online to test it with.

Page <- "http://en.wikipedia.org/wiki/List_of_countries_by_population"
u <- tidyHTML(Page)
tables <- readHTMLTable(u)
str(tables)

## End(Not run)

mrdwab/SOfun documentation built on June 20, 2020, 6:15 p.m.