html2text: Identifies the text of an html string.

Description Usage Arguments Value References See Also Examples

Description

This function is used for processing an html string in order to find the main text of this string. The output is a list that contains the extracted text.

Usage

1
html2text(html, session = RCurl::getCurlHandle())

Arguments

html

A string containing valid html code.

session

The CURLHandle object giving the structure for the options and that will process the command. For curlMultiPerform, this is an object of class code MultiCURLHandle-class.

Value

A list with the main text in the html.

References

http://www.datasciencetoolkit.org/developerdocs#html2text

See Also

curlPerform, getCurlHandle, dynCurlReader

Examples

1
2
3
4
5
6
7
## Not run: 
html <- '<html><head><title>MyTitle</title></head><body><script 
         type="text/javascript">something();</script><div>Some actual 
         text</div></body></html>'
html2text(html)

## End(Not run)

rtelmore/RDSTK documentation built on May 12, 2019, 4:26 p.m.