regexprNamedMatches: Extract matched named substrings

Description Usage Arguments Details Value Examples

Description

Extracts the substrings matched by named capture groups from the provided match result output from regexpr (with perl= TRUE). Will return a matrix of strings with one column for each named capture group and one row for each string in the vector matched against. By default will return empty strings if match fails, but can be set to return NAs if desired.

Usage

1
regexprNamedMatches(matchResults, matchText, use.na = FALSE)

Arguments

matchResults

The results of a match performed using regexpr(regExp, matchText, perl= TRUE) where regExp has named capture groups like (?<theName>...).

matchText

The text originally matched against, a vector of strings.

use.na

By default returns empty strings for all capture groups if the regExp fails to match. That can not be distinguished from a match with all capture groups matching nothing, e.g. (?<num>\d*). Setting this TRUE causes a failing match to return all NA values instead.

Details

This is intended for use with regexpr to parse a string and extract substrings via named capture groups, similar to how regmatches is used. If only one string is matched against, then returned matrix will have one row only.

Note that regExp with multiple capture groups will need to use greedy and non-greedy matching carefully if the capture groups are to work correctly and not interfering with each other or not-capture components of the regExp.

Value

A matrix with one column for each named capture group (with matching column name) and one row for each string in the text vector matched to. The value of each cell is the text matched by the named capture group. If any capture group does not match, all returned strings are empty for that text vector element (row), or NA if use.na= TRUE

Examples

1
2
3
4
5
6
7
regExp <- "(?<key>.+?)\\s*=\\s*(?<value>.+)"
data <- c('name = Stuart R. Jefferys', 'email=srj@unc.edu')
matchResults <- regexpr(regExp, data, perl= TRUE)
regexprNamedMatches(matchResults, data)
#=>      key     value
#=> [1,] "name"  "Stuart R. Jefferys"
#=> [2,] "email" "srj@unc.edu"

jefferys/fusionExpressionPlot documentation built on May 19, 2019, 3:59 a.m.