tools: Chinese Numerals Detection and Extraction

toolsR Documentation

Chinese Numerals Detection and Extraction

Description

Functions to detect and extract Chinese numerals in character object and string.

Usage

is_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  literal = FALSE,
  strict = FALSE,
  ...
)

has_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  ...
)

extract_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  prefix = NULL,
  suffix = NULL,
  ...
)

Arguments

x

the character object or string to be tested or to extract from.

lang

the language of the Chinese numerals. "tc" for Traditional Chinese. "sc" for Simplified Chinese. The default is "tc", but this can be changed by setting options(cnum.lang = "sc").

mode

the scale naming system to be enforced. See the ‘Details’ section for the list of supported modes.

financial

logical: should the financial numerals be used (daxie shuzi)?

literal

logical: should the numerals be converted literally? (e.g. 721 to be converted to "qi er yi" instead of "qibai ershiyi" and vice versa)

strict

logical: Should the Chinese numerals format be strictly enforced? A casual test only checks if x contains Chinese numerals characters. A strict test checks if x is valid Chinese numerals. (e.g. "yi bai yi" will pass the casual test and fail the strict test)

...

optional arguments to be passed to grepl (for is_cnum and has_cnum) or str_extract_all (for extract_cnum). Disregarded when strict = TRUE.

prefix

the prefix of the Chinese numerals. Only numerals with the designated prefix are extracted. Supports regular expression(s).

suffix

the suffix of the Chinese numerals. Only numerals with the designated suffix are extracted. Supports regular expression(s).

Value

is_cnum returns a logical vector indicating is Chinese numerals or not for each element of x).

has_cnum returns a logical vector indicating contains Chinese numerals or not for each element of x.

extract_cnum returns a list of character vectors containing the extracted Chinese numerals.

Functions

  • is_cnum(): Test if character object is Chinese numerals. A wrapper around grepl.

  • has_cnum(): Test if string contains Chinese numerals. A wrapper around grepl.

  • extract_cnum(): Extracts Chinese numerals from string. A wrapper around str_extract_all from stringr.

Details

The following scale naming systems are supported:

  • "casual": the casual naming system used outside of mainland China, i.e. 1e+09 is referred to as "yi zhao".

  • "casualPRC": the casual naming system used in mainland China, i.e. 1e+9 is referred to as "yi wanyi".

  • "SIprefix": the SI prefix system used in Taiwan as stipulated in the document Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples.

  • "SIprefixPRC": the SI prefix system used in mainland China as stipulated in the document China Statutory Measurement Units.

  • "SIprefixPRClong": a variant of "SIprefixPRC" with long prefixes, e.g. 1e+09 is referred to as "yi jika" instead of "yi ji".

References

The standard for mode "SIprefix" Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples is available from https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965 (in Traditional Chinese).

The standard for mode "SIprefixPRC" The State Council's Order on the Unified Implementation of Legal Measurement Units in Our Country is available from the PRC State Council's website (in Simplified Chinese).

See Also

Functions for conversion

Examples

is_cnum("yibai ershiyi")

has_cnum("yibai bashi yuan")

extract_cnum("shisiyi ren")


elgarteo/cnum documentation built on Jan. 12, 2025, 7:13 p.m.