u_escape: Escape non-ASCII characters for portable R strings

Description Usage Arguments Value Examples

Description

The function converts non-ASCII characters to "\u1234" (when sufficient) or "\U12345678" notation using the corresponding Unicode code point. A backslash is literally included in the escape code, equivalent to two backslashes when typing a string.

Usage

1
u_escape(x, ranges = FALSE)

Arguments

x

a single character string.

ranges

a logical flag. If FALSE (the default), the function only returns the possibly modified string. If TRUE, the locations of the modifications are also returned.

Value

If ranges is FALSE, returns a version of x where non-ASCII characters have been encoded (character string). If ranges is TRUE, returns a list with two elements:

"text"

the escaped string,

"ranges"

a data.frame with two columns: "first" denotes the beginning of each escape sequence added to the output string, and "last" is the final character of the sequence, in the substr sense.

Examples

1
2
3
4
5
6
7
x <- "Mot\u00f6rhead"
u_escape(x, ranges = TRUE)
x2 <- c(charToRaw("grinning face "), as.raw(c(0xf0, 0x9f, 0x98, 0x80)),
        charToRaw(" is code point U+1f600"))
x2 <- rawToChar(x2)
Encoding(x2) <- "UTF-8"
u_escape(x2)

mvkorpel/uniscape documentation built on May 27, 2019, 11:55 a.m.