Home

/

GitHub

/

omegahat/RTidyHTML

/

tidyHTML: Tidy HTML content

tidyHTML: Tidy HTML content
In omegahat/RTidyHTML: Tidy HTML documents

View source: R/tidy.R

tidyHTML

R Documentation

Tidy HTML content

Description

This function processes an HTML document and tidys the malformed nodes so that they are legitimate TML, i.e. with end nodes (</li>, </p>) and attributes enclosed in quotes. This also corrects the HTML in various ways.

The resulting document can then be used with a more correct structure. This, for example, makes processing it with the XML parsing facilities more straightforward.

This uses libtidy from http://tidy.sourceforge.net

Usage

tidyHTML(doc, asXHTML = FALSE, 
         asText = inherits(doc, "AsIs") ||
                    (!file.exists(doc) && length(grep("\\<", doc))),
         size = nchar(doc)*1.2, withErrors = FALSE)

Arguments

`doc`	the name of the file containing the HTML document or the contents of the HTML itself.
`asXHTML`	a logical value controlling whether the result is output as XHTML.
`asText`	a logical value indicating whether the value of `doc` is the HTML content or the name of a file.
`size`	an integer scalar giving a guess of the size of the resulting tidied document
`withErrors`	a logical value controlling whether a string giving the errors in the input document are also returned

Value

If withErrors is TRUE, a list with two elements named doc and errors, both of which are scalar strings.

If withErrors is FALSE, a character string containing the tidied document's contents.

Author(s)

Duncan Temple Lang

References

http://tidy.sourceforge.net

Examples

 doc = system.file("testData", "foo.html", package = "RTidyHTML")
 tidyHTML(doc)

 txt = readLines(url("http://www.omegahat.org"))
 tidyHTML(txt)

omegahat/RTidyHTML documentation built on Nov. 29, 2023, 12:42 a.m.

omegahat/RTidyHTML index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

omegahat/RTidyHTML
Tidy HTML documents

tidyHTML: Tidy HTML content
In omegahat/RTidyHTML: Tidy HTML documents

Tidy HTML content

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to tidyHTML in omegahat/RTidyHTML...

R Package Documentation

Browse R Packages

We want your feedback!

omegahat/RTidyHTML Tidy HTML documents

tidyHTML: Tidy HTML content In omegahat/RTidyHTML: Tidy HTML documents

Tidy HTML content

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to tidyHTML in omegahat/RTidyHTML...

R Package Documentation

Browse R Packages

We want your feedback!

omegahat/RTidyHTML
Tidy HTML documents

tidyHTML: Tidy HTML content
In omegahat/RTidyHTML: Tidy HTML documents