re_match_all: Extract All Regular Expression Matches Into a Data Frame
In rematch2: Tidy Output from Regular Expression Matching

Description Usage Arguments Value Tidy Data Note See Also Examples

View source: R/all.R

This function is a thin wrapper on the gregexpr base R function, to extract the matching (sub)strings as a data frame. It extracts all matches, and potentially their capture groups as well.

1	re_match_all(text, pattern, perl = TRUE, ...)

`text`	Character vector.
`pattern`	A regular expression. See `regex` for more about regular expressions.
`perl`	logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.
`...`	Additional arguments to pass to `gregexpr` (or `regexpr` if `text` is of length zero).

A tidy data frame (see Section “Tidy Data”). The list columns contain character vectors with as many entries as there are matches for each input element.

The return value is a tidy data frame where each row corresponds to an element of the input character vector text. The values from text appear for reference in the .text character column. All other columns are list columns containing the match data. The .match column contains the match information for full regular expression matches while other columns correspond to capture groups if there are any, and PCRE matches are enabled with perl = TRUE (this is on by default). If capture groups are named the corresponding columns will bear those names.

Each match data column list contains match records, one for each element in text. A match record is a named list, with entries match, start and end that are respectively the matching (sub) string, the start, and the end positions (using one based indexing).

If the input text character vector has length zero, regexpr is called instead of gregexpr, because the latter cannot extract the number and names of the capture groups in this case.

Other tidy regular expression matching: re_exec_all(), re_exec(), re_match()

name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
re_match_all(notables, name_rex)

# A tibble: 2 x 4
      first      last                              .text    .match
     <list>    <list>                              <chr>    <list>
1 <chr [2]> <chr [2]>   Ben Franklin and Jefferson Davis <chr [2]>
2 <chr [1]> <chr [1]>               "\tMillard Fillmore" <chr [1]>