html2text: Identifies the text of an html string

Description Usage Arguments Value Author(s) References See Also Examples

Description

This function is used for processing an html string in order to find the main text of this string. The output is a list that contains the extracted text.

Usage

1
html2text(html, session=getCurlHandle())

Arguments

html

A string containing valid html code.

session

This is the CURLHandle object giving the structure for the options and that will process the command. For curlMultiPerform, this is an object of class code MultiCURLHandle-class.

Value

A list with the main text in the html.

Author(s)

Ryan Elmore

References

http://www.datasciencetoolkit.org/developerdocs#html2text

See Also

curlPerform, getCurlHandle, dynCurlReader

Examples

1
2
3
4
5
6
7
	## Not run: 
		html <- '<html><head><title>MyTitle</title></head><body><script
		 type="text/javascript">something();</script><div>Some actual
		 text</div></body></html>'
		html2text(html)
	
## End(Not run)

RDSTK documentation built on May 2, 2019, 6:49 a.m.