extractHTMLStrip: Simply strip HTML Tags from Document
In tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Description Usage Arguments Note Author(s) See Also

View source: R/extract.R

extractHTMLStrip parses an url, character or filename, reads the DOM tree, removes all HTML tags in the tree and outputs the source text without markup.

1	extractHTMLStrip(url, asText = TRUE, encoding, ...)

`url`	character, url or filename
`asText`	specifies if url parameter is a `character`, defaults to TRUE
`encoding`	specifies local encoding to be used, depending on platform
`...`	Additional parameters for `htmlTreeParse`

Input text should be enclosed in <html>'TEXT'</html> tags to ensure correct DOM parsing (issue especially under .Platform$os.type = 'windows')

Mario Annau

htmlTreeParse encloseHTML

tm.plugin.webmining documentation built on May 2, 2019, 1:10 p.m.

tm.plugin.webmining index

Package overview Introduction to the tm.plugin.webmining Package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com