View source: R/encoding_management.R
stri_enc_mark | R Documentation |
Reads declared encodings for each string in a character vector as seen by stringi.
stri_enc_mark(str)
str |
character vector or an object coercible to a character vector |
According to Encoding
,
R has a simple encoding marking mechanism:
strings can be declared to be in latin1
,
UTF-8
or bytes
.
Moreover, we may check (via the R/C API) whether
a string is in ASCII (R assumes that this holds if and only if
all bytes in a string are not greater than 127,
so there is an implicit assumption that your platform uses
an encoding that extends ASCII)
or in the system's default (a.k.a. unknown
in Encoding
)
encoding.
Intuitively, the default encoding should be equivalent to
the one you use on stdin
(e.g., your 'keyboard').
In stringi we assume that such an encoding
is equivalent to the one returned by stri_enc_get
.
It is automatically detected by ICU
to match – by default – the encoding part of the LC_CTYPE
category
as given by Sys.getlocale
.
Returns a character vector of the same length as str
.
Unlike in the Encoding
function, here the possible encodings are:
ASCII
, latin1
, bytes
, native
,
and UTF-8
. Additionally, missing values are handled properly.
This gives exactly the same data that is used by all the functions in stringi to re-encode their inputs.
Marek Gagolewski and other contributors
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v103.i02")}
Other encoding_management:
about_encoding
,
stri_enc_info()
,
stri_enc_list()
,
stri_enc_set()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.