regexp2df: [!+] Capture information to a dataframe by regular...
In GegznaV/spMisc: Infix Operators, Convenience and Other Kind of Miscellaneous Functions

Description Usage Arguments Value Contribution Note Author(s) See Also Examples

Capture information in substrings of text that match named and unnamed tokens of regular expressions and convert the result to a data frame.

regexp2df(
  text,
  pattern,
  ignore.case = FALSE,
  perl = TRUE,
  stringsAsFactors = default.stringsAsFactors(),
  ...
)

`text`	The text to be parsed: a character vector where matches are sought, or an object which can be coerced by `as.character` to a character vector.
`pattern`	Perl-like regular expression.
`ignore.case`	if `FALSE`, the pattern matching is case sensitive and if `TRUE`, case is ignored during matching.
`perl`	logical. Should Perl-compatible regexps be used?
`stringsAsFactors`	logical, passed to `as.data.frame`.
`...`	Other arguments to be passed to `gregexpr`.

A data frame with parsed information.

In this function ideas from this answer on github.com were used.

Call to function gregexpr with parameter perl = TRUE is used.

Author Vilmantas Gegzna, contributor MrFlick, as he provided ideas on github.com (see section Contribution).

More about regular expressions used in R: regex
Website handy for creating and testing Perl-like regular expressions (library pcre, version 1, not 2) https://regex101.com/r/dS3iP1/1#pcre

Functions gregexpr, regcapturedmatches, operator from package magrittr %>%.

Other spMisc utilities: bru(), clc(), clear(), fCap(), isFALSE(), list_AddRm(), make.filenames(), open_wd(), printDuration(), st01()

text1 <- c("A_111  B_aaa",
              "A_222  B_bbb",
              "A_333  B_ccc",
              "A_444  B_ddd",
              "A_555  B_eee")

# Named tokens
pattern1_named_tokens <- 'A_(?<Part_A>.*)  B_(?<Part_B>.*)'

regexp2df(text1, pattern1_named_tokens)
##     Part_A Part_B
## 1    111    aaa
## 2    222    bbb
## 3    333    ccc
## 4    444    ddd
## 5    555    eee

# Unnamed tockens - groups inside brackets:
pattern1_unnamed_tokens <- 'A_(.*)  B_(.*)'

regexp2df(text1, pattern1_unnamed_tokens)
##       X     X.1
## 1    111    aaa
## 2    222    bbb
## 3    333    ccc
## 4    444    ddd
## 5    555    eee


#----------------------------------------------------------
# Wrong. There must be NO SPACES in token's name:


## Not run: 
pattern2 <- 'A (?<Part A>.*)  B (?<Part B>.*)'
regexp2df(text1, pattern2)

## Error ...


## End(Not run)
#----------------------------------------------------------
text3 <- c("sn555 ID_O20-5-684_N52_2_Subt2_01.",
              "sn555 ID_O20-5-984_S52_8_Subt10_11.")

pattern3 <- paste0('sn(?<serial_number>.*) ID_(?<ID>.*)_(?<Class>[NS])',
                   '(?<Sector>.*)_(?<Point>.*)_[Ss]ubt.*\\.');

regexp2df(text3, pattern3)

##   serial_number    ID       Class Sector Point
## 1      555      O20-5-684     N     52     2
## 2      555      O20-5-984     S     52     8

#----------------------------------------------------------
# List all .R files in your working directory:

regexp2df(dir(),'(?<R_file>.*\\.[rR]$)')


# Do the same by using chaining operator %>%:
library(dplyr)

dir() %>% regexp2df('(?<R_file>\\.*[rR]$)')

#----------------------------------------------------------
# Capture several types of files:

expr <- paste0('(?<R_file>.*\\.[rR]$)|',
               '(?<Rmd_file>.*\\.[rR]md$)|',
               '(?<CSV_file>.*\\.[cC][sS][vV]$)')
dir() %>% regexp2df(expr)