Description Usage Arguments Details Value See Also Examples
The function checks whether given sequences of bytes forms a proper UTF-8 string.
1 |
str |
character vector, a raw vector, or a list of
|
Negative answer means that a string is surely not valid
UTF-8. Positive result does not mean that we should be
absolutely sure. E.g. (c4,85)
properly represents
("Polish a with ogonek") in UTF-8 as well as ("A umlaut",
"Ellipsis") in WINDOWS-1250. Also note that UTF-8, as well
as most 8-bit encodings, have ASCII as their subsets (note
that stri_enc_isascii
=>
stri_enc_isutf8
).
However, the longer the sequence, the bigger the possibility that the result is indeed in UTF-8 – this is because not all sequences of bytes are valid UTF-8.
This function is independent of the way R marks encodings in character strings (see Encoding and stringi-encoding).
Returns a logical vector. Its i-th element indicates whether the i-th string corresponds to a valid UTF-8 byte sequence.
Other encoding_detection: stri_enc_detect2
;
stri_enc_detect
;
stri_enc_isascii
;
stri_enc_isutf16be
,
stri_enc_isutf16le
,
stri_enc_isutf32be
,
stri_enc_isutf32le
;
stringi-encoding
1 2 3 | stri_enc_isutf8(letters[1:3])
stri_enc_isutf8("\u0105\u0104")
stri_enc_isutf8("\u1234\u0222")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.