unicode_code_points: Get Unicode code points

hex2ucpR Documentation

Get Unicode code points

Description

hex2ucp(), int2ucp(), name2ucp(), and str2ucp() return Unicode code points as character vectors. is_ucp() returns TRUE if a valid Unicode code point.

Usage

hex2ucp(x)

int2ucp(x)

str2ucp(x)

name2ucp(x, type = c("exact", "grep"), ...)

is_ucp(x)

block2ucp(x, omit_unnamed = TRUE)

range2ucp(x, omit_unnamed = TRUE)

Arguments

x

R objects coercible to the respective Unicode character data types. See Unicode::as.u_char() for hex2ucp() and int2ucp(), base::utf8ToInt() for str2ucp(), Unicode::u_char_from_name() for name2ucp(), Unicode::as.u_char_range() for range2ucp(), and Unicode::u_blocks() for block2ucp().

type

one of "exact" or "grep", or an abbreviation thereof.

...

arguments to be passed to grepl when using this for pattern matching.

omit_unnamed

Omit control codes or unassigned code points

Details

hex2ucp(x) is a wrapper for as.character(Unicode::as.u_char(x)). int2ucp is a wrapper for as.character(Unicode::as.u_char(as.integer(x))). str2ucp(x) is a wrapper for as.character(Unicode::as.u_char(utf8ToInt(x))). name2ucp(x) is a wrapper for as.character(Unicode::u_char_from_name(x)). However missing values are coerced to NA_character_ instead of "<NA>". Note the names of bm_font() objects must be character vectors as returned by these functions and not Unicode::u_char objects.

Value

A character vector of Unicode code points.

See Also

ucp2label() and is_combining_character().

Examples

  # These are all different ways to get the same 'R' code point
  hex2ucp("52")
  hex2ucp(as.hexmode("52"))
  hex2ucp("0052")
  hex2ucp("U+0052")
  hex2ucp("0x0052")
  int2ucp(82) # 82 == as.hexmode("52")
  int2ucp("82") # 82 == as.hexmode("52")
  int2ucp(utf8ToInt("R"))
  ucp2label("U+0052")
  name2ucp("LATIN CAPITAL LETTER R")
  str2ucp("R")

  # Potential gotcha as as.hexmode("52") == as.integer("82") == 52L
  all.equal(hex2ucp(52L), int2ucp(52L)) # TRUE
  all.equal(hex2ucp("52"), int2ucp("82")) # TRUE
  all.equal(hex2ucp("82"), int2ucp("82")) # FALSE

  block2ucp("Basic Latin")
  block2ucp("Basic Latin", omit_unnamed = FALSE)
  range2ucp("U+0020..U+0030")


bittermelon documentation built on Feb. 16, 2023, 8:08 p.m.