regexprMatches: Extract matched substrings

View source: R/stringUtils.R

regexprMatchesR Documentation

Extract matched substrings

Description

You probably want regexprCapture as it is likely you are trying to use a regular expression with capture groups. This function parses an already generated match result; it is used by regexprCapture.

Usage

regexprMatches(matchResults, matchText, use.na = FALSE)

Arguments

matchResults

The results of a match performed using regexpr(regExp, matchText, perl= TRUE) where regExp has capture groups or named capture groups, like ([^:]*) or (?<beforeColon>[^:]*). Will not work with perl= FALSE.

matchText

The text originally matched against, a vector of strings.

use.na

By default returns empty strings for all capture groups if the regExp fails to match. That can not be distinguished from a match with all capture groups matching nothing, e.g. (?<num>\d*). Setting this TRUE causes a failing match to return all NA values instead.

Details

Extracts the substrings matched by capture groups from a provided match result, i.e from output from regexpr, with perl= TRUE). Will return a matrix of strings with one column for each capture group and one row for each string in the vector matched against. By default will return empty strings if match fails, but can be set to return NAs if desired. Supports named capture groups, matrix columns will be named as appropriate.

This is intended for use with regexpr to parse a string and extract substrings via capture groups, similar to how regmatches is used. If only one string is matched against, then returned matrix will have one row only.

Note that regExp with multiple capture groups will need to use greedy and non-greedy matching carefully to avoid the capture groups interfering with each other.

Value

A matrix with one column for each capture group (with matching column name for named capture groups) and one row for each string in the text vector matched to. The value of each cell is the text matched by the named capture group. If any capture group does not match, all returned strings are empty for that text vector element (row), or NA if use.na= TRUE

See Also

regexprCapture regex

Examples

regExp <- "(?<key>.+?)\\s*=\\s*(?<value>.+)"
data <- c('name = Stuart R. Jefferys', 'email=srj@unc.edu')
matchResults <- regexpr(regExp, data, perl= TRUE)
regexprMatches(matchResults, data)
#=>      key     value
#=> [1,] "name"  "Stuart R. Jefferys"
#=> [2,] "email" "srj@unc.edu"


jefferys/JefferysRUtils documentation built on Jan. 12, 2024, 9:18 p.m.