re2_regexp | R Documentation |
re2_regexp
compiles a character string containing a regular
expression and returns a pointer to the object.
re2_regexp(pattern, ...)
pattern |
Character string containing a regular expression. | ||||||||||||||||||||||||||
... |
Options, which are (defaults in parentheses):
The following options are only consulted when
The |
Compiled regular expression.
RE2 regular expression syntax is similar to Perl's with some of
the more complicated things thrown away. In particular,
backreferences and generalized assertions are not available, nor
is \Z
.
See re2_syntax for the syntax supported by RE2, and a comparison with PCRE and PERL regexps.
For those not familiar with Perl's regular expressions, here are some examples of the most commonly used extensions:
"hello (\w+) world" | -- | \w matches a "word" character. |
"version (\d+)" | -- | \d matches a digit. |
"hello\s+world" | -- | \s matches any whitespace character. |
"\b(\w+)\b" | -- | \b matches non-empty string at word boundary. |
"(?i)hello" | -- | (?i) turns on case-insensitive matching. |
"/\*(.*?)\*/" | -- | .*? matches . minimum no. of times possible.
|
The double backslashes are needed when writing R string literals. However, they should NOT be used when writing raw string literals:
r"(hello (\w+) world)" | -- | \w matches a "word" character. |
r"(version (\d+))" | -- | \d matches a digit. |
r"(hello\s+world)" | -- | \s matches any whitespace character. |
r"(\b(\w+)\b)" | -- | \b matches non-empty string at word boundary. |
r"((?i)hello)" | -- | (?i) turns on case-insensitive matching. |
r"(/\*(.*?)\*/)" | -- | .*? matches . minimum no. of times possible.
|
When using UTF-8 encoding, case-insensitive matching will perform simple case folding, not full case folding.
re2_syntax has regular expression syntax.
re2p <- re2_regexp("hello world")
stopifnot(mode(re2p) == "externalptr")
## UTF-8 and matching interface
# By default, pattern and input text are interpreted as UTF-8.
# The Latin1 option causes them to be interpreted as Latin-1.
x <- "fa\xE7ile"
Encoding(x) <- "latin1"
re2_detect(x, re2_regexp("fa\xE7", encoding = "Latin1"))
## Case insensitive
re2_detect("fOobar ", re2_regexp("Foo", case_sensitive = FALSE))
## Literal string (as opposed to regular expression)
## Matches only when 'literal' option is TRUE
re2_detect("foo\\$bar", re2_regexp("foo\\$b", literal = TRUE))
re2_detect("foo\\$bar", re2_regexp("foo\\$b", literal = FALSE))
## Use of never_nl
re <- re2_regexp("(abc(.|\n)*def)", never_nl = FALSE)
re2_match("abc\ndef\n", re)
re <- re2_regexp("(abc(.|\n)*def)", never_nl = TRUE)
re2_match("abc\ndef\n", re)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.