str_match_all_named: All matches from multiple subjects, three argument syntax

Description Usage Arguments Value Author(s) Examples

Description

Extract all matches of pattern from each element of subject.vec using named capturing regular expressions. For the common case of extracting all matches of a regex from a multi-line text file, please use str_match_all_variable instead. Result depends on engine (either PCRE or RE2) which can be specified via the namedCapture.engine option.

Usage

1
2
str_match_all_named(subject.vec, 
    pattern, type.list = NULL)

Arguments

subject.vec

character vector of subjects.

pattern

named capture regular expression (character vector of length 1).

type.list

named list of functions to apply to captured groups, in order to create non-character (typically numeric) columns in the result.

Value

A list of data.frames with one row for each subject and one column for each capture group if type.list is a list of functions. Otherwise a list of character matrices. If pattern contains a group named "name" then it will not be returned as a column, and will instead be used for the rownames of the data.frames or matrices. If subject.vec has names, they will be used as the names of the returned list.

Author(s)

Toby Dylan Hocking

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
chr.pos.vec <- c(
  "chr10:213,054,000-213,055,000",
  "chrM:111,000-222,000",
  "this will not match",
  NA, # neither will this.
  "chr1:110-111 chr2:220-222") # two possible matches.
chr.pos.pattern <- paste0(
  "(?P<chrom>chr.*?)",
  ":",
  "(?P<chromStart>.*?)",
  "-",
  "(?P<chromEnd>[0-9,]*)")
## Specifying a list of conversion functions means that str_match_*
## should convert the matched groups from character to whatever is
## returned by those functions.
keep.digits <- function(x)as.integer(gsub("[^0-9]", "", x))
conversion.list <- list(chromStart=keep.digits, chromEnd=keep.digits)
## Use str_match_all_named to get ALL matches in each subject (not
## just the first match).
(match.df.list <- namedCapture::str_match_all_named(
  chr.pos.vec, chr.pos.pattern, conversion.list))
str(match.df.list)
## If there is a capture group named "name" then it will be used for
## the rownames of the result.
name.value.vec <- c(
  H3K27me3="  sampleType=monocyte   assayType=H3K27me3    cost=5",
  H3K27ac="sampleType=monocyte assayType=H3K27ac",
  H3K4me3=" sampleType=Myeloidcell cost=30.5  assayType=H3K4me3")
name.value.pattern <- paste0(
  "(?P<name>[^ ]+?)",
  "=",
  "(?P<value>[^ ]+)")
namedCapture::str_match_all_named(name.value.vec, name.value.pattern)

namedCapture documentation built on April 2, 2020, 1:07 a.m.