UnicodeOperators: Unicode Pattern Operators

Description Usage Arguments Note References Examples

Description

Manipulate and combine Unicode Properties.

Usage

1
2
3
4
5
6
7
unicode_inverse(x, char_class = TRUE)

unicode_union(..., char_class = TRUE)

unicode_intersect(x, y, char_class = TRUE)

unicode_setdiff(x, y, char_class = TRUE)

Arguments

x

A character vector containing Unicode General Category or Unicode Properties. Use the functional forms (ugc_*()) not the constants.

char_class

TRUE or FALSE. Should the values be wrapped into a character class?

...

Character vectors containing Unicode General Category or Unicode Properties. Use the functional forms (ugc_*()) not the constants.

y

A character vector containing Unicode General Category or Unicode Properties. Use the functional forms (ugc_*()) not the constants.

Note

Use these with ICU-based regular expression engines (stringi and stringr).

References

http://userguide.icu-project.org/strings/unicodeset

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# POSIX [:punct:] is more or less equivalent to the union of
# Unicode punctuation and symbol general categories
unicode_union(ugc_punctuation(), ugc_symbol())

# Everything except "A" to "Z" (including punctuation, control chars etc.)
unicode_inverse("[A-Z]")

# Uppercase letters, except "A" to "Z"
unicode_setdiff(ugc_uppercase_letter(), "[A-Z]")

# "A" to "F" (in upper or lower case)
unicode_intersect(ugc_letter(), up_ascii_hex_digit())

# Usage
x <- c(letters, LETTERS)
rx <- unicode_intersect(ugc_letter(), up_ascii_hex_digit())
stringi::stri_extract_first_regex(x, rx)

rebus.unicode documentation built on May 2, 2019, 6:40 a.m.