regcaptures: Extract captured substrings

View source: R/utils.R

regcapturesR Documentation

Extract captured substrings

Description

Extract the captured substrings from match data obtained by regexpr, gregexpr, or regexec. regcaptures2 is a convenience wrapper for regcaptures; see examples.

Usage

regcaptures(x, m, use.names = TRUE)

regcaptures2(x, pattern, use.names = TRUE, simplify = FALSE)

Arguments

x

a character vector

m

an object with match data

use.names

logical; if FALSE, all names (capture names and list names) will be stripped; if TRUE (default) and capture groups have names, these will be used; otherwise, match start positions will be used

pattern

a character string containing a Perl-compatible regular expression

simplify

logical; if TRUE, result will be coerced to a matrix

alternatively, a data frame with a column corresponding to each capture expression in order; captured character vectors are coerced to the type of the column, and the column names are carried over to the return value; any data in the prototype are ignored

Value

A list with a matrix of captures for each string in x. Note that the column names of each matrix will be the starting positions of the captures.

References

Adapted from https://gist.github.com/MrFlick/10413321

See Also

regmatches; strcapture

Examples

x <- c('larry:35,M', 'alison:22,F', 'dave:,M', 'lily:55,F', 'no data')
p1 <- '(.*):(\\d+)?,([MF])?'
p2 <- '(?<name>.*):(?<age>\\d+)?,(?<sex>[MF])?'

m <- regexpr(p1, x, perl = TRUE)
regcaptures(x, m)

## regcaptures2 is a convenience function for the two-step above
regcaptures2(x, p1)
regcaptures2(x, p1, simplify = TRUE)

## both will use named captures (if perl = TRUE)
regcaptures(x, gregexpr(p2, x, perl = TRUE))
regcaptures2(x, p2, simplify = TRUE)


## use simplify = proto
proto <- data.frame(name = character(), age = integer(), sex = character())
regcaptures2(x, p1, simplify = proto)

proto <- data.frame(name = '', age = NA_integer_, sex = factor('', c('M', 'F')))
regcaptures2(x, p1, simplify = proto)


## capture overlapping matches
x <- 'ACCACCACCCAC'
m <- gregexpr('(?=([AC]C))', x, perl = TRUE)
regcaptures(x, m)[[1]]

m <- gregexpr('(?=(CC))', x, perl = TRUE)
regcaptures(x, m)[[1]]

## compare:
mapply(function(xx) substr(x, xx, xx + 1L), m[[1]])


raredd/rawr documentation built on March 4, 2024, 1:36 a.m.