HTMLdecode: Decode and Encode HTML Entities

View source: R/functions.R

HTMLencodeR Documentation

Decode and Encode HTML Entities

Description

Decode and encode HTML entities.

Usage

HTMLdecode(x, named = TRUE, hex = TRUE, decimal = TRUE)
HTMLencode(x, use.iconv = FALSE, encode.only = NULL)

Arguments

x

a string (character vector of length one)

use.iconv

logical. Should conversion via iconv be tried from native encoding to UTF-8?

named

logical: replace named character references?

hex

logical: replace hexadecimal character references?

decimal

logical: replace decimal character references?

encode.only

character

Details

HTMLdecode replaces named, hexadecimal and decimal character references as defined by HTML5 (see References) with characters. The resulting character vector is marked as UTF-8 (see Encoding).

HTMLencode replaces UTF-8-encoded substrings with HTML5 named entities (a.k.a. “named character references”). A semicolon ‘;’ will not be replaced by the entity ‘;’. Other than that, however, HTMLencode is quite thorough in its job: it will replace all characters for which named entities exists, even ‘,’ and or ‘?’. You can restrict the characters to be replaced by specifying encode.only.

Value

character

Author(s)

Enrico Schumann

References

https://www.w3.org/TR/html5/syntax.html#named-character-references

https://html.spec.whatwg.org/multipage/syntax.html#character-references

See Also

TeXencode

Examples

HTMLdecode(c("Max & Moritz", "4 < 9"))
## [1] "Max & Moritz" "4 < 9"

HTMLencode(c("Max & Moritz", "4 < 9"))
## [1] "Max &amp; Moritz" "4 &LT; 9"

HTMLencode("Max, Moritz & more")
## [1] "Max&comma; Moritz &amp; more"
HTMLencode("Max, Moritz & more", encode.only = c("&", "<", ">"))
## [1] "Max, Moritz &amp; more"

textutils documentation built on April 3, 2023, 5:34 p.m.