extractHTMLStrip: Simply strip HTML Tags from Document

Description Usage Arguments Note Author(s) See Also

View source: R/extract.R

Description

extractHTMLStrip parses an url, character or filename, reads the DOM tree, removes all HTML tags in the tree and outputs the source text without markup.

Usage

1
extractHTMLStrip(url, asText = TRUE, encoding, ...)

Arguments

url

character, url or filename

asText

specifies if url parameter is a character, defaults to TRUE

encoding

specifies local encoding to be used, depending on platform

...

Additional parameters for htmlTreeParse

Note

Input text should be enclosed in <html>'TEXT'</html> tags to ensure correct DOM parsing (issue especially under .Platform$os.type = 'windows')

Author(s)

Mario Annau

See Also

xmlNode

htmlTreeParse encloseHTML


tm.plugin.webmining documentation built on May 2, 2019, 1:10 p.m.