View source: R/encoding_detection.R
stri_enc_isutf8 | R Documentation |
The function checks whether given sequences of bytes forms a proper UTF-8 string.
stri_enc_isutf8(str)
str |
character vector, a raw vector, or
a list of |
FALSE
means that a string is certainly not valid UTF-8.
However, false positives are possible. For instance,
(c4,85)
represents ('a with ogonek') in UTF-8
as well as ('A umlaut', 'Ellipsis') in WINDOWS-1250.
Also note that UTF-8, as well as most 8-bit encodings, extend ASCII
(note that stri_enc_isascii
implies that
stri_enc_isutf8
).
However, the longer the sequence, the greater the possibility that the result is indeed in UTF-8 – this is because not all sequences of bytes are valid UTF-8.
This function is independent of the way R marks encodings in character strings (see Encoding and stringi-encoding).
Returns a logical vector. Its i-th element indicates whether the i-th string corresponds to a valid UTF-8 byte sequence.
Marek Gagolewski and other contributors
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v103.i02")}
Other encoding_detection:
about_encoding
,
stri_enc_detect2()
,
stri_enc_detect()
,
stri_enc_isascii()
,
stri_enc_isutf16be()
stri_enc_isutf8(letters[1:3])
stri_enc_isutf8('\u0105\u0104')
stri_enc_isutf8('\u1234\u0222')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.