u_char_names: Unicode Character Names

u_char_namesR Documentation

Unicode Character Names

Description

Find the names or labels of Unicode characters, or Unicode characters by their name.

Usage

u_char_name(x)
u_char_from_name(x, type = c("exact", "grep"), ...)
u_char_label(x)

Arguments

x

an R object which can be coerced to a u_char vector of Unicode characters via as.u_char for u_char_name and u_char_label; a character vector otherwise.

type

one of "exact" or "grep", or an abbreviation thereof.

...

arguments to be passed to grepl when using this for pattern matching.

Details

The Unicode Standard provides a convention for labeling code points that do not have character names (control, reserved, noncharacter, private-use and surrogate code points). These labels can be obtained by u_char_label.

By default, exact matching is used for finding Unicode characters by name. When type = "grep", grepl is used for matching x against the Unicode character names; for now, Hangul syllable and CJK Unified Ideograph names are ignored in this case.

Value

For u_char_name and u_char_label, a character vector with the names or labels, respectively, of the corresponding Unicode characters.

For u_char_from_name, a u_char object giving the Unicode characters with name exactly matching the given names.

Examples

x <- as.u_char(utf8ToInt("Austria"))
u_char_name(x)

## Derived Hangul syllable character names are also supported for
## finding characters by exact matching:
x <- u_char_name("0xAC00")
x
u_char_from_name(x)

## Find all Unicode characters with name matching 'DIGIT ONE'.
x <- u_char_from_name("\\bDIGIT ONE\\b", "g")
## And show their names.
u_char_name(x)

Unicode documentation built on May 29, 2024, 2:36 a.m.