xmlr: Tools for parsing and generating XML within R and S-Plus.

# The following illustrates how we can get the text
# Michael Conklin.

# Also see ./foo.html as an example with javascript content
# and a pseudo/fake css node.

doc = htmlParse("http://www.omegahat.org/")
txt = xpathSApply(doc, "//body//text()", xmlValue)

#The result is a character vector that contains all the text.

#By limiting the nodes to the body, we avoid the content in <head>
#such as inlined JavaScript or CSS.

#It is also possible that a document may have <script> elements
#in the document containing JavaScript that you don't want.
#You can omit these

  txt = xpathSApply(doc, "//body//text()[not(ancestor::script)]", xmlValue)

# And if there were other elements we wanted to ignore, then you could use

 txt = xpathSApply(doc,
                   "//body//text()[not(ancestor::script) and not(ancestor::otherElement)]",
                   xmlValue)

cosmicexplorer/xmlr documentation built on May 30, 2019, 8:28 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cosmicexplorer/xmlr
Tools for parsing and generating XML within R and S-Plus.

inst/examples/HTMLText.R
In cosmicexplorer/xmlr: Tools for parsing and generating XML within R and S-Plus.

R Package Documentation

Browse R Packages

We want your feedback!

cosmicexplorer/xmlr Tools for parsing and generating XML within R and S-Plus.

inst/examples/HTMLText.R In cosmicexplorer/xmlr: Tools for parsing and generating XML within R and S-Plus.

R Package Documentation

Browse R Packages

We want your feedback!

cosmicexplorer/xmlr
Tools for parsing and generating XML within R and S-Plus.

inst/examples/HTMLText.R
In cosmicexplorer/xmlr: Tools for parsing and generating XML within R and S-Plus.