regcaptures: Extract captured substrings
In raredd/rawr: rawr miscellaneous

regcaptures

R Documentation

Extract captured substrings

Description

Extract the captured substrings from match data obtained by regexpr, gregexpr, or regexec. regcaptures2 is a convenience wrapper for regcaptures; see examples.

Usage

regcaptures(x, m, use.names = TRUE)

regcaptures2(x, pattern, use.names = TRUE, simplify = FALSE)

Arguments

`x`	a character vector
`m`	an object with match data
`use.names`	logical; if `FALSE`, all names (capture names and list names) will be stripped; if `TRUE` (default) and capture groups have names, these will be used; otherwise, match start positions will be used
`pattern`	a character string containing a Perl-compatible regular expression
`simplify`	logical; if `TRUE`, result will be coerced to a matrix alternatively, a data frame with a column corresponding to each capture expression in order; captured character vectors are coerced to the type of the column, and the column names are carried over to the return value; any data in the prototype are ignored

Value

A list with a matrix of captures for each string in x. Note that the column names of each matrix will be the starting positions of the captures.

References

Adapted from https://gist.github.com/MrFlick/10413321

Examples

x <- c('larry:35,M', 'alison:22,F', 'dave:,M', 'lily:55,F', 'no data')
p1 <- '(.*):(\\d+)?,([MF])?'
p2 <- '(?<name>.*):(?<age>\\d+)?,(?<sex>[MF])?'

m <- regexpr(p1, x, perl = TRUE)
regcaptures(x, m)

## regcaptures2 is a convenience function for the two-step above
regcaptures2(x, p1)
regcaptures2(x, p1, simplify = TRUE)

## both will use named captures (if perl = TRUE)
regcaptures(x, gregexpr(p2, x, perl = TRUE))
regcaptures2(x, p2, simplify = TRUE)


## use simplify = proto
proto <- data.frame(name = character(), age = integer(), sex = character())
regcaptures2(x, p1, simplify = proto)

proto <- data.frame(name = '', age = NA_integer_, sex = factor('', c('M', 'F')))
regcaptures2(x, p1, simplify = proto)


## capture overlapping matches
x <- 'ACCACCACCCAC'
m <- gregexpr('(?=([AC]C))', x, perl = TRUE)
regcaptures(x, m)[[1]]

m <- gregexpr('(?=(CC))', x, perl = TRUE)
regcaptures(x, m)[[1]]

## compare:
mapply(function(xx) substr(x, xx, xx + 1L), m[[1]])

raredd/rawr documentation built on June 14, 2025, 1:26 p.m.