tools: Chinese Numerals Detection and Extraction

Description Usage Arguments Value Functions Details References See Also Examples

Description

Functions to detect and extract Chinese numerals in character object and string.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
is_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  literal = FALSE,
  strict = FALSE,
  ...
)

has_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  ...
)

extract_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  prefix = NULL,
  suffix = NULL,
  ...
)

Arguments

x

the character object or string to be tested or to extract from.

lang

the language of the Chinese numerals. "tc" for Traditional Chinese. "sc" for Simplified Chinese. The default is "tc", but this can be changed by setting options(cnum.lang = "sc").

mode

the scale naming system to be enforced. See the ‘Details’ section for the list of supported modes.

financial

logical: should the financial numerals be used (daxie shuzi)?

literal

logical: should the numerals be converted literally? (e.g. 721 to be converted to "qi er yi" instead of "qibai ershiyi" and vice versa)

strict

logical: Should the Chinese numerals format be strictly enforced? A casual test only checks if x contains Chinese numerals characters. A strict test checks if x is valid Chinese numerals. (e.g. "yi bai yi" will pass the casual test and fail the strict test)

...

optional arguments to be passed to grepl (for is_cnum and has_cnum) or str_extract_all (for extract_cnum). Disregarded when strict = TRUE.

prefix

the prefix of the Chinese numerals. Only numerals with the designated prefix are extracted. Supports regular expression(s).

suffix

the suffix of the Chinese numerals. Only numerals with the designated suffix are extracted. Supports regular expression(s).

Value

is_cnum returns a logical vector indicating is Chinese numerals or not for each element of x).

has_cnum returns a logical vector indicating contains Chinese numerals or not for each element of x.

extract_cnum returns a list of character vectors containing the extracted Chinese numerals.

Functions

Details

The following scale naming systems are supported:

References

The standard for mode "SIprefix" Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples is available from https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965 (in Traditional Chinese).

The standard for mode "SIprefixPRC" China Statutory Measurement Units is available from http://gkml.samr.gov.cn/nsjg/jls/201902/t20190225_291134.html (in Simplified Chinese).

See Also

Functions for conversion

Examples

1
2
3
4
5
is_cnum("yibai ershiyi")

has_cnum("yibai bashi yuan")

extract_cnum("shisiyi ren")

cnum documentation built on Jan. 13, 2021, 7:53 p.m.