str_match_named: First match from multiple subjects, three argument syntax

str_match_namedR Documentation

First match from multiple subjects, three argument syntax

Description

Extract the first match of pattern from each element of subject.vec using a named capture regular expression. This function is mostly for internal use; most users should use str_match_variable instead. Result depends on engine (either PCRE or RE2) which can be specified via the namedCapture.engine option.

Usage

str_match_named(subject.vec, 
    pattern, type.list = NULL)

Arguments

subject.vec

character vector of subjects.

pattern

named capture regular expression (character vector of length 1).

type.list

named list of functions to apply to captured groups, in order to create non-character (typically numeric) columns in the result.

Value

A data.frame with one row for each subject and one column for each capture group if type.list is a list of functions. Otherwise a character matrix. If subject.vec has names then they will be used for the rownames of the returned data.frame or character matrix. Otherwise if pattern has a group named "name" then it will not be returned as a column, and will instead be used for the rownames.

Author(s)

Toby Dylan Hocking

Examples


chr.pos.vec <- c(
  "chr10:213,054,000-213,055,000",
  "chrM:111,000-222,000",
  "this will not match",
  NA, # neither will this.
  "chr1:110-111 chr2:220-222") # two possible matches.
chr.pos.pattern <- paste0(
  "(?P<chrom>chr.*?)",
  ":",
  "(?P<chromStart>.*?)",
  "-",
  "(?P<chromEnd>[0-9,]*)")
## Specifying a list of conversion functions means that str_match_*
## should convert the matched groups from character to whatever is
## returned by those functions.
keep.digits <- function(x)as.integer(gsub("[^0-9]", "", x))
conversion.list <- list(chromStart=keep.digits, chromEnd=keep.digits)
(match.df <- namedCapture::str_match_named(chr.pos.vec, chr.pos.pattern, conversion.list))
str(match.df)


tdhock/namedCapture documentation built on Jan. 27, 2024, 9:02 p.m.