utf8_nchar: Count the number of characters in a character vector

View source: R/utf8.R

utf8_ncharR Documentation

Count the number of characters in a character vector

Description

By default it counts Unicode grapheme clusters, instead of code points.

Usage

utf8_nchar(x, type = c("chars", "bytes", "width", "graphemes", "codepoints"))

Arguments

x

Character vector, it is converted to UTF-8.

type

Whether to count graphemes (characters), code points, bytes, or calculate the display width of the string.

Value

Numeric vector, the length of the strings in the character vector.

See Also

Other UTF-8 string manipulation: utf8_graphemes(), utf8_substr()

Examples

# Grapheme example, emoji with combining characters. This is a single
# grapheme, consisting of five Unicode code points:
# * `\U0001f477` is the construction worker emoji
# * `\U0001f3fb` is emoji modifier that changes the skin color
# * `\u200d` is the zero width joiner
# * `\u2640` is the female sign
# * `\ufe0f` is variation selector 16, requesting an emoji style glyph
emo <- "\U0001f477\U0001f3fb\u200d\u2640\ufe0f"
cat(emo)

utf8_nchar(emo, "chars") # = graphemes
utf8_nchar(emo, "bytes")
utf8_nchar(emo, "width")
utf8_nchar(emo, "codepoints")

# For comparision, the output for width depends on the R version used:
nchar(emo, "chars")
nchar(emo, "bytes")
nchar(emo, "width")

cli documentation built on June 22, 2024, 10:57 a.m.