encoding | R Documentation |
Functions for testing and adapting the (declared) encoding
of the components of a vector of mode character
.
is.utf8(x)
is.ascii(x)
is.locale(x)
translate(x, recursive = FALSE, internal = FALSE)
fixEncoding(x, latin1 = FALSE)
x |
a vector (of character). |
recursive |
option to process list components. |
internal |
option to use internal translation. |
latin1 |
option to assume |
is.utf8
tests if the components of a vector of character
are true UTF-8 strings, i.e. contain one or more valid UTF-8
multi-byte sequence(s).
is.locale
tests if the components of a vector of character
are in the encoding of the current locale.
translate
encodes the components of a vector of character
in the encoding of the current locale. This includes the names
attribute of vectors of arbitrary mode. If recursive = TRUE
the components of a list
are processed. If internal = TRUE
multi-byte sequences that are invalid in the encoding of the current
locale are changed to literal hex numbers (see FIXME).
fixEncoding
sets the declared encoding of the components of
a vector of character to their correct or preferred values. If
latin1 = TRUE
strings that are not valid UTF-8 strings are
declared to be in "latin1"
. On the other hand, strings that
are true UTF-8 strings are declared to be in "UTF-8"
encoding.
The same type of object as x
with the (declared) encoding
possibly changed.
Currently translate
uses iconv
and therefore is not
guaranteed to work on all platforms.
Christian Buchta
FIXME PCRE, RFC 3629
Encoding
and iconv
.
## Note that we assume R runs in an UTF-8 locale
text <- c("aa", "a\xe4")
Encoding(text) <- c("unknown", "latin1")
is.utf8(text)
is.ascii(text)
is.locale(text)
## implicit translation
text
##
t1 <- iconv(text, from = "latin1", to = "UTF-8")
Encoding(t1)
## oops
t2 <- iconv(text, from = "latin1", to = "utf-8")
Encoding(t2)
t2
is.locale(t2)
##
t2 <- fixEncoding(t2)
Encoding(t2)
## explicit translation
t3 <- translate(text)
Encoding(t3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.