stri_enc_toutf8: Convert To UTF-8

Description Usage Arguments Details Value See Also

Description

Converts character strings with (possibly) internally marked encodings to UTF-8 strings.

Usage

1
stri_enc_toutf8(str, is_unknown_8bit = FALSE)

Arguments

str

character vector to be converted

is_unknown_8bit

single logical value, see Details

Details

If is_unknown_8bit is set to TRUE and a string is marked (internally) as being neither ASCII nor UTF-8-encoded, then all bytecodes > 127 are replaced with the Unicode REPLACEMENT CHARACTER (\Ufffd). Bytes-marked strings are treated as 8-bit strings.

Otherwise, R encoding marks is assumed to be trustworthy (ASCII, UTF-8, Latin1, or Native). Bytes encoding fail here.

Note that the REPLACEMENT CHARACTER may be interpreted as Unicode NA value for single characters.

Value

Returns a character vector.

See Also

Other encoding_conversion: stri_conv, stri_encode; stri_enc_fromutf32; stri_enc_toascii; stri_enc_toutf32; stringi-encoding


stringi documentation built on May 2, 2019, 4:54 p.m.