options( markdown.HTML.template = system.file("misc", "docco-template.html", package = "knitr")) library(re2r)
It is a way to search for matches in strings. This is done by searching with "patterns" through the string.
You probably know the *
and ?
charachters used in the dir command on the command line. The *
character means "zero or more arbitrary characters" and the ?
means "one arbitrary character".
When using a pattern like text?.*
, it will find files like textf.txt
, text1.R
, and text9.Rmd
.
This is exactly the way RE works, and RE supplies much more patterns.
Example usages could be:
Basically we can do the following operations on a string with REs:
Search through a string for a pattern, and return boolean result or matched substrings.
Search for a substring, and return that substring.
Search for a substring that matches a pattern, and replace it by another string.
Here is a quick overview over the most common methods on how to execute a regular expression in re2r.
re2_detect(string, pattern)
Searches the string expression for a pattern and returns boolean result.
re2_detect("this is just one test", "(o.e)")
.
stands for any character, possibly including newline . For more syntax, you can check out the RE2 Syntax vignette.
Here is an example of email pattern.
show_regex("\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b", width = 670, height = 280)
re2_detect("test@gmail.com", "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b")
re2_match(string, pattern)
This function will return the capture groups in ()
.
(res = re2_match("this is just one test", "(o.e)")) str(res)
The return result is a character matrix. .1
is the first capture group and it is unnamed group.
We can create named capture group with (?P<name>pattern)
syntax.
(res = re2_match("this is just one test", "(?P<testname>this)( is)")) str(res)
If there is no capture group, the matched origin strings will be returned.
test_string = c("this is just one test", "the second test"); (res = re2_match(test_string, "is")) str(res)
re2_match_all()
will return the all of patterns in a string instead of just the first one.
re2_match_all( string = c("this is test", "this is test, and this is not test", "they are tests"), pattern = "(?P<testname>this)( is)")
re2_replace(string, pattern, rewrite)
Searches the string "input string" for the occurence(s) of a substring that matches 'pattern' and replaces the found substrings with "rewrite text".
input_string = "this is just one test"; new_string = "my" re2_replace(input_string, "(o.e)", new_string)
re2_extract(input, pattern)
Searches the string "input string" for the occurence(s) of a substring that matches 'pattern' and return the found substrings with "rewrite text".
re2_extract("yabba dabba doo", "yabba") re2_extract("test@me.com", "(.*)@([^.]*)")
We can create a regular expression object (RE2 object) from a string. It will reduce the time to parse the syntax of the same pattern.
And this will also give us more option for the pattern. run help(re2)
to get more detials.
regexp = re2("test", case_sensitive = FALSE) print(regexp)
Use parallel
option to enable multithread feature. It will improve performance for large inputs with a multi core CPU.
re2_match(string, pattern, parallel = T)
Base R functions such as regexpr
use PCRE when given the perl = TRUE
argument. PCRE includes many useful features, such as named capture, but has an exponential time complexity.
Base R functions such as regexpr
use TRE when given the perl = FALSE
argument. TRE has a polynomial time complexity but does not include named capture groups.
stringr::str_match
and stringi::stri_match
use the regex engine from the ICU library, which has an exponential time complexity. The stringi package does not support named capture yet as such a feature set is still considered as experimental in ICU.
RE2 is a primarily DFA based regexp engine from Google that is very fast at matching large amounts of text. It is has a polynomial time complexity
(or fast and scalable
in general case), but it does not support look behind
and some regular expression features.
Although being slightly different to use (because of the design of the engines), all are quite similar to Perl's implementation of REs.
Benchmarks are disabled by default for CRAN. See https://qinwenfeng.com/re2r_doc for the results by Travis-CI.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.