htm2txt: Convert a html document to simple plain texts by removing all...

Description Usage Arguments Value Examples

Description

Convert a html document to simple plain texts by removing all html tags

Usage

1
htm2txt(htm, list = "\n• ", pagebreak = "\n\n----------\n\n")

Arguments

htm

A character vector, containing a html document, to be converted into plain texts (other objects are coerced into character vectors).

list

A character that replaces a <li>...</li> tag (referring to a numbering or bullet for lists).

pagebreak

A character that replaces a <hr> tag (referring to a thematic change in the content or a page break).

Value

A character vector containing plain texts converted from the html document.

Examples

1
2
3
4
text = htm2txt("<html><body>html texts</body></html>")
text = htm2txt(c("Hello<p>World", "Goodbye<br>Friends"))
text = htm2txt("<p>Menu:</p><ul></li>Coffee</li><li>Tea</li></ul>", list = "\n- ")
text = htm2txt("Page 1<hr>Page 2", pagebreak = "\n\n[NEW PAGE]\n\n")

Example output



htm2txt documentation built on May 2, 2019, 9:56 a.m.

Related to htm2txt in htm2txt...