HTMLdecode: Decode and Encode HTML Entities

View source: R/functions.R

HTMLencodeR Documentation

Decode and Encode HTML Entities

Description

Decode and encode HTML entities.

Usage

HTMLdecode(x, named = TRUE, hex = TRUE, decimal = TRUE)
HTMLencode(x, use.iconv = FALSE, encode.only = NULL)
HTMLrm(x, ...)

Arguments

x

HTMLdecode, HTMLencode: a character vector of length one; for HTMLrm: a character vector

use.iconv

logical. Should conversion via iconv be tried from native encoding to UTF-8?

named

logical: replace named character references?

hex

logical: replace hexadecimal character references?

decimal

logical: replace decimal character references?

encode.only

character

...

other arguments

Details

HTMLdecode replaces named, hexadecimal and decimal character references as defined by HTML5 (see References) with characters. The resulting character vector is marked as UTF-8 (see Encoding).

HTMLencode replaces UTF-8-encoded substrings with HTML5 named entities (a.k.a. “named character references”). A semicolon ‘;’ will not be replaced by the entity ‘;’. Other than that, however, HTMLencode is quite thorough in its job: it will replace all characters for which named entities exists, even ‘,’ and or ‘?’. You can restrict the characters to be replaced by specifying encode.only.

HTMLrm removes HTML tags. All content between style and head tags is removed, as are comments. Note that each element of x is considered a single HTML document; so for multiline documents, paste/collapse the document.

Value

character

Author(s)

Enrico Schumann

References

https://www.w3.org/TR/html5/syntax.html#named-character-references

https://html.spec.whatwg.org/multipage/syntax.html#character-references

See Also

TeXencode

Examples

HTMLdecode(c("Max & Moritz", "4 < 9"))
## [1] "Max & Moritz" "4 < 9"

HTMLencode(c("Max & Moritz", "4 < 9"))
## [1] "Max &amp; Moritz" "4 &LT; 9"

HTMLencode("Max, Moritz & more")
## [1] "Max&comma; Moritz &amp; more"
HTMLencode("Max, Moritz & more", encode.only = c("&", "<", ">"))
## [1] "Max, Moritz &amp; more"


HTMLrm("before <a href='http://enricoschumann.net'>LINK</a>  after")
## [1] "before http://enricoschumann.net  after"

textutils documentation built on May 29, 2024, 10:37 a.m.