Regexp Match Operator

Share:

Description

A grep/sub-like function that returns one or more back-referenced pattern matches in the form of a vector or as columns in a dataframe (respectively). Unlike sub, this function is more geared towards data extraction rather than data cleaning. The name is derived from the popular PERL regular expression 'match' operator function 'm' (eg. 'extraction =~ m/sought_text/').

Usage

1
m(pattern, vect, names="V", types="character", mismatch=NA, ...)

Arguments

pattern

A regular expression pattern with at least one back reference.

vect

A string or vector of strings one which to apply the pattern match.

names

The vector of names of the new variables to be created out of vect. Must be the same length as vect.

types

The vector of types of the new variables to be created out of vect. Must be the same length as vect.

mismatch

What do to when no pattern is found. NA returns NA, TRUE returns original value (currently only implimented for single match, vector returns)

...

other parameters passed on to grep

Value

Either a vector or a dataframe depending on the number of backreferences in the pattern.

See Also

sub, gsub, regexpr, grep, gregexpr.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## single vector output examples
m(pattern="asdf.([A-Z]{4}).", 
  vect=c('asdf.AS.fds','asdf.ABCD.asdf', '12.ASDF.asdf','asdf.REWQ.123'))


Rurls <- c('http://www.r-project.org',    'http://cran.r-project.org',
           'http://journal.r-project.org','http://developer.r-project.org')
m(pattern="http://([a-z]+).r-project.org", vect=Rurls)


# dataframe output examples

data(mtcars)
m(pattern="^([A-Za-z]+) ?(.*)$", 
  vect=rownames(mtcars), names=c('make','model'), types=rep('character',2))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.