str_match_named: First match from multiple subjects, three argument syntax

Description Usage Arguments Value Author(s) Examples

Description

Extract the first match of pattern from each element of subject.vec using a named capture regular expression. This function is mostly for internal use; most users should use str_match_variable instead. Result depends on engine (either PCRE or RE2) which can be specified via the namedCapture.engine option.

Usage

1
2
str_match_named(subject.vec, 
    pattern, type.list = NULL)

Arguments

subject.vec

character vector of subjects.

pattern

named capture regular expression (character vector of length 1).

type.list

named list of functions to apply to captured groups, in order to create non-character (typically numeric) columns in the result.

Value

A data.frame with one row for each subject and one column for each capture group if type.list is a list of functions. Otherwise a character matrix. If subject.vec has names then they will be used for the rownames of the returned data.frame or character matrix. Otherwise if pattern has a group named "name" then it will not be returned as a column, and will instead be used for the rownames.

Author(s)

Toby Dylan Hocking

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
chr.pos.vec <- c(
  "chr10:213,054,000-213,055,000",
  "chrM:111,000-222,000",
  "this will not match",
  NA, # neither will this.
  "chr1:110-111 chr2:220-222") # two possible matches.
chr.pos.pattern <- paste0(
  "(?P<chrom>chr.*?)",
  ":",
  "(?P<chromStart>.*?)",
  "-",
  "(?P<chromEnd>[0-9,]*)")
## Specifying a list of conversion functions means that str_match_*
## should convert the matched groups from character to whatever is
## returned by those functions.
keep.digits <- function(x)as.integer(gsub("[^0-9]", "", x))
conversion.list <- list(chromStart=keep.digits, chromEnd=keep.digits)
(match.df <- namedCapture::str_match_named(chr.pos.vec, chr.pos.pattern, conversion.list))
str(match.df)

namedCapture documentation built on April 2, 2020, 1:07 a.m.