re_match_all: Extract All Regular Expression Matches Into a Data Frame

Description Usage Arguments Value Tidy Data Note See Also Examples

View source: R/all.R

Description

This function is a thin wrapper on the gregexpr base R function, to extract the matching (sub)strings as a data frame. It extracts all matches, and potentially their capture groups as well.

Usage

1

Arguments

text

Character vector.

pattern

A regular expression. See regex for more about regular expressions.

perl

logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.

...

Additional arguments to pass to gregexpr (or regexpr if text is of length zero).

Value

A tidy data frame (see Section “Tidy Data”). The list columns contain character vectors with as many entries as there are matches for each input element.

Tidy Data

The return value is a tidy data frame where each row corresponds to an element of the input character vector text. The values from text appear for reference in the .text character column. All other columns are list columns containing the match data. The .match column contains the match information for full regular expression matches while other columns correspond to capture groups if there are any, and PCRE matches are enabled with perl = TRUE (this is on by default). If capture groups are named the corresponding columns will bear those names.

Each match data column list contains match records, one for each element in text. A match record is a named list, with entries match, start and end that are respectively the matching (sub) string, the start, and the end positions (using one based indexing).

Note

If the input text character vector has length zero, regexpr is called instead of gregexpr, because the latter cannot extract the number and names of the capture groups in this case.

See Also

Other tidy regular expression matching: re_exec_all(), re_exec(), re_match()

Examples

1
2
3
4
5
6
7
8
9
name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
re_match_all(notables, name_rex)

Example output

# A tibble: 2 x 4
      first      last                              .text    .match
     <list>    <list>                              <chr>    <list>
1 <chr [2]> <chr [2]>   Ben Franklin and Jefferson Davis <chr [2]>
2 <chr [1]> <chr [1]>               "\tMillard Fillmore" <chr [1]>

rematch2 documentation built on May 1, 2020, 9:06 a.m.