re_match: Extract Regular Expression Matches Into a Data Frame

Description Usage Arguments Value Note See Also Examples

View source: R/package.R

Description

re_match wraps regexpr and returns the match results in a convenient data frame. The data frame has one column for each capture group if perl=TRUE, and one final columns called .match for the matching (sub)string. The columns of the capture groups are named if the groups themselves are named.

Usage

1

Arguments

text

Character vector.

pattern

A regular expression. See regex for more about regular expressions.

perl

logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.

...

Additional arguments to pass to regexpr.

Value

A data frame of character vectors: one column per capture group, named if the group was named, and additional columns for the input text and the first matching (sub)string. Each row corresponds to an element in the text vector.

Note

re_match uses PCRE compatible regular expressions by default (i.e. perl = TRUE in regexpr). You can switch this off but if you do so capture groups will no longer be reported as they are only supported by PCRE.

See Also

Other tidy regular expression matching: re_exec_all(), re_exec(), re_match_all()

Examples

1
2
3
4
5
6
7
8
dates <- c("2016-04-20", "1977-08-08", "not a date", "2016",
  "76-03-02", "2012-06-30", "2015-01-21 19:58")
isodate <- "([0-9]{4})-([0-1][0-9])-([0-3][0-9])"
re_match(text = dates, pattern = isodate)

# The same with named groups
isodaten <- "(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"
re_match(text = dates, pattern = isodaten)

Example output

# A tibble: 7 x 5
  ``    ``    ``    .text            .match    
  <chr> <chr> <chr> <chr>            <chr>     
1 2016  04    20    2016-04-20       2016-04-20
2 1977  08    08    1977-08-08       1977-08-08
3 <NA>  <NA>  <NA>  not a date       <NA>      
4 <NA>  <NA>  <NA>  2016             <NA>      
5 <NA>  <NA>  <NA>  76-03-02         <NA>      
6 2012  06    30    2012-06-30       2012-06-30
7 2015  01    21    2015-01-21 19:58 2015-01-21
# A tibble: 7 x 5
  year  month day   .text            .match    
  <chr> <chr> <chr> <chr>            <chr>     
1 2016  04    20    2016-04-20       2016-04-20
2 1977  08    08    1977-08-08       1977-08-08
3 <NA>  <NA>  <NA>  not a date       <NA>      
4 <NA>  <NA>  <NA>  2016             <NA>      
5 <NA>  <NA>  <NA>  76-03-02         <NA>      
6 2012  06    30    2012-06-30       2012-06-30
7 2015  01    21    2015-01-21 19:58 2015-01-21

rematch2 documentation built on May 1, 2020, 9:06 a.m.