Character classes

Description

Match character classes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
alnum(lo, hi, char_class = TRUE)

alpha(lo, hi, char_class = TRUE)

blank(lo, hi, char_class = TRUE)

cntrl(lo, hi, char_class = TRUE)

digit(lo, hi, char_class = TRUE)

graph(lo, hi, char_class = TRUE)

lower(lo, hi, char_class = TRUE)

printable(lo, hi, char_class = TRUE)

punct(lo, hi, char_class = TRUE)

space(lo, hi, char_class = TRUE)

upper(lo, hi, char_class = TRUE)

hex_digit(lo, hi, char_class = TRUE)

any_char(lo, hi)

dgt(lo, hi, char_class = TRUE)

wrd(lo, hi, char_class = TRUE)

spc(lo, hi, char_class = TRUE)

not_dgt(lo, hi, char_class = TRUE)

not_wrd(lo, hi, char_class = TRUE)

not_spc(lo, hi, char_class = TRUE)

ascii_digit(lo, hi, char_class = TRUE)

ascii_lower(lo, hi, char_class = TRUE)

ascii_upper(lo, hi, char_class = TRUE)

ascii_alpha(lo, hi, char_class = TRUE)

ascii_alnum(lo, hi, char_class = TRUE)

char_range(lo, hi, char_class = lo < hi)

Arguments

lo

A non-negative integer. Minimum number of repeats, when grouped.

hi

positive integer. Maximum number of repeats, when grouped.

char_class

A logical value. Should x be wrapped in a character class? If NA, the function guesses whether that's a good idea.

Value

A character vector representing part or all of a regular expression.

Note

R has many built-in locale-dependent character classes, like [:alnum:] (representing lower or upper case letters or numbers). There are also some generic character classes like \w (representing lower or upper case letters or numbers or underscores). Finally, there are ASCII-only ways of specifying letters like a-zA-Z. Which version you want depends upon how you want to deal with international characters, and the vagaries of the underlying regular expression engine. I suggest reading the regex help page and doing lots of testing.

References

http://www.regular-expressions.info/shorthand.html and http://www.rexegg.com/regex-quickstart.html#posix

See Also

regex, Unicode

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# R character classes
alnum()
alpha()
blank()
cntrl()
digit()
graph()
lower()
printable()
punct()
space()
upper()
hex_digit()

# Generic classes
any_char()
dgt()
wrd()
spc()

# Generic negated classes
not_dgt()
not_wrd()
not_spc()

# Non-locale-specific classes
ascii_digit()
ascii_lower()
ascii_upper()

# Don't provide a class wrapper
digit(char_class = FALSE) # same as DIGIT

# Match repeated values
digit(3)
digit(3, 5)
digit(0)
digit(1)
digit(0, 1)

# Ranges of characters
char_range(0, 7) # octal number

# Usage
(rx <- digit(3))
stringi::stri_detect_regex(c("123", "one23"), rx)