nchar takes a character vector as an argument and
returns a vector whose elements contain the sizes of
the corresponding elements of
x. Internally, it is a generic,
for which methods can be defined (see InternalMethods).
nzchar is a fast way to find out if elements of a character
vector are non-empty strings.
nchar(x, type = "chars", allowNA = FALSE, keepNA = NA) nzchar(x, keepNA = FALSE)
character vector, or a vector to be coerced to a character vector. Giving a factor is an error.
character string: partial matching to one of
The ‘size’ of a character string can be measured in one of
three ways (corresponding to the
The number of bytes needed to store the string (plus in C a final terminator which is not counted).
The number of characters.
The number of columns
cat will use to
print the string in a monospaced font. The same as
if this cannot be calculated.
These will often be the same, and usually will be in single-byte
locales (but note how
type determines the default for
keepNA). There will be differences between the first two with
multibyte character sequences, e.g. in UTF-8 locales.
The internal equivalent of the default method of
as.character is performed on
x (so there is no
method dispatch). If you want to operate on non-vector objects
passing them through
deparse first will be required.
nchar, an integer vector giving the sizes of each element.
For missing values (i.e.,
2, the number of printing characters, if false.
type = "width" gives (an approximation to) the number of
columns used in printing each element in a terminal font, taking into
account double-width, zero-width and ‘composing’ characters.
The approximation is likely to be poor when there are unassigned or
allowNA = TRUE and an element is detected as invalid in a
multi-byte character set such as UTF-8, its number of characters and
the width will be
NA. Otherwise the number of characters will
be non-negative, so
!is.na(nchar(x, "chars", TRUE)) is a test
A character string marked with
"bytes" encoding (see
Encoding) has a number of bytes, but neither a known
number of characters nor a width, so the latter two types are
allowNA = TRUE, otherwise an error.
Names, dims and dimnames are copied from the input.
nzchar, a logical vector of the same length as
true if and only if the element has non-zero size; if the element is
nzchar() is true when
keepNA is false (the
This does not by default give the number of characters that
will be used to
print() the string. Use
encodeString to find that.
Where character strings have been marked as UTF-8, the number of
characters and widths will be computed in UTF-8, even though printing
may use escapes such as <U+2642> in a non-UTF-8 locale.
The concept of ‘width’ is a slippery one even in a monospaced
font. Some human languages have the concept of combining
characters, in which two or more characters are rendered together: an
example would be
"y\u306", which is two characters of width
one: combining characters are given width zero, and there are other
zero-width characters such as the zero-width space
Some East Asian languages have ‘wide’ characters, ideographs
which are conventionally printed across two columns when mixed with
ASCII and other ‘narrow’ characters in those languages. The
problem is that whether a computer prints wide characters over two or
one columns depends on the font, with it not being uncommon to use two
columns in a font intended for East Asian users and a single column in
a ‘Western’ font. Unicode has encodings for ‘fullwidth’
versions of ASCII characters and ‘halfwidth’ versions of
Katakana (Japanese) and Hangul (Korean) characters. Then there is the
‘East Asian Ambiguous class’ (Greek, Cyrillic, signs, some
accented Latin chars, etc), for which the historical practice was to
use two columns in East Asia and one elsewhere. The width quoted by
nchar for characters in that class (and some others) depends on
the locale, being one except in some East Asian locales on some OSes
Control characters are usually given width zero: this includes CR and LF. Computing the width of a string containing control characters should be avoided (and may depend on the OS and R version).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Unicode Standard Annex #11: East Asian Width. https://www.unicode.org/reports/tr11/
strwidth giving width of strings for plotting;
x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech") nchar(x) # 5 6 6 1 15 nchar(deparse(mean)) # 18 17 <-- unless mean differs from base::mean ## NA behaviour as function of keepNA=* : logi <- setNames(, c(FALSE, NA, TRUE)) sapply(logi, \(k) data.frame(nchar = nchar (NA, keepNA=k), nzchar = nzchar(NA, keepNA=k))) x <- NA; x nchar(x, keepNA= TRUE) # 5 6 NA 1 15 nchar(x, keepNA=FALSE) # 5 6 2 1 15 stopifnot(identical(nchar(x ), nchar(x, keepNA= TRUE)), identical(nchar(x, "w"), nchar(x, keepNA=FALSE)), identical(is.na(x), is.na(nchar(x)))) ##' nchar() for all three types : nchars <- function(x, ...) vapply(c("chars", "bytes", "width"), function(tp) nchar(x, tp, ...), integer(length(x))) nchars("\u200b") # in R versions (>= 2015-09-xx): ## chars bytes width ## 1 3 0 data.frame(x, nchars(x)) ## all three types : same unless for NA ## force the same by forcing 'keepNA': (ncT <- nchars(x, keepNA = TRUE)) ## .... NA NA NA .... (ncF <- nchars(x, keepNA = FALSE))## .... 2 2 2 .... stopifnot(apply(ncT, 1, function(.) length(unique(.))) == 1, apply(ncF, 1, function(.) length(unique(.))) == 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.