extractMatches: Extract substrings that match a regex

Description Usage Arguments Details Value Examples

Description

Pulls out the matches for marked regions of regular expressions.

Usage

1

Arguments

data

a data frame. Or, use a pipe to specify the data frame.

pattern

the regex to be used for matching

var

the name of the variable from which to extract the substrings

Details

Wrap regexes in parentheses to signal that the matching content is to be extracted as a string.

Use parentheses to mark out parts of regular expressions that identify substrings to be extracted if there is a match. For instance, the pattern "(.*)[aeiouy]$" matches all strings ending in a vowel. The "(.*)" indicates that the substring to be extracted is all the characters in the string up to that final vowel. NA is returned for strings that have no match.

Value

a data frame that copies the original but adds a new variable giving the substring identified in the first parentheses. If there are more than one set of extraction parentheses, a new variable is added for each.

Examples

1
2
3
4
5
6
# grab the root of names ending in vowels.
mosaicData::KidsFeet %>% extractMatches(pattern="(.*)[aeiouy]$", name)
# Two matches:
# grab the first letter (if capitalized) in strings ending in a vowel
# also grab any letter between the third character and the final vowel.
mosaicData::KidsFeet %>% extractMatches(pattern="^([A-Z])..(.*)[aeiouy]$", name)

dtkaplan/DCF documentation built on May 15, 2019, 4:57 p.m.