Conversion of UTF-8 encoded character vectors to and from integer vectors representing a UTF-32 encoding.
object to be converted.
logical: should the conversion be to a single character string or multiple individual characters?
These will work in any locale, including on platforms that do not otherwise support multi-byte character sets.
Unicode defines a name and a number of all of the glyphs it
encompasses: the numbers are called code points: since RFC3629
they run from
0x10FFFF (with about 12% being
assigned by version 10.0 of the Unicode standard).
intToUtf8 does not handle surrogate pairs (which should not
occur in UTF-32): inputs in the surrogate ranges are mapped to
utf8ToInt converts a length-one character string encoded in
UTF-8 to an integer vector of Unicode code points. It checks validity
of the input. (Currently it accepts UTF-8 encodings of code points
0x10FFFF: these are no longer regarded as valid by
the UTF-8 RFC and will in future be mapped to
‘Corrigendum 9’ the UTF-8 encodings of the
0xFFFF are regarded as
valid as from R 3.4.3.)
intToUtf8 converts a numeric vector of Unicode code points
either (default) to a single character string or a character vector of
single characters. Non-integral numeric values are truncated to
integers: values above the maximum are mapped to
NA. For a
single character string
0 is silently omitted: otherwise
0 is mapped to
Encoding of a
NA return value is declared as
NA inputs are mapped to
https://tools.ietf.org/html/rfc3629, the current standard for UTF-8.
http://www.unicode.org/versions/corrigendum9.html for non-characters.
1 2 3 4 5
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.