construct: Construct Human Readable Regular Expressions

Description Usage Arguments Value Examples

View source: R/construct.R

Description

This function is used to construct human readable regular expressions from sub-expressions. The user may provide additional meta information about each sub-expression. This meta information is an optional name and comment for the sub-expressions. This allows one to write regular expressions in a fashion similar to writing code, that is the regular expression is written top to bottom, the syntax is broken up into manageable chunks, the sub-expressions can be indented to give structural insight such as nested groups. Finally, sub-expressions can be commented to provide linguistic grounding for more complex sub-expressions.

Usage

1

Arguments

...

A series of comma separated character strings (sub-expressions) that may optionally be named, commented (see ?`%:)%`, and indented.

Value

Returns a character vector of the class regexr. The attributes of the returned object retain the original name and comment properties.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
## Minimal Example
minimal <- construct("a", "b", "c")
minimal
unglue(minimal)
comments(minimal)
subs(minimal)
test(minimal)
summary(minimal)

## Example 1
m <- construct(
    space =   "\\s+"              %:)%  "I see",
    simp =    "(?<=(foo))",
    or =      "(;|:)\\s*"         %:)%  "comment on what this does",
    is_then = "[ia]s th[ae]n"
)

m
unglue(m)
summary(m)
subs(m)
comments(m)
subs(m)[4] <- "(FO{2})|(BAR)"
summary(m)
test(m)
## Not run: 
subs(m)[5:7] <- c("(", "([A-Z]|(\\d{5})", ")")
test(m)

## End(Not run)

library(qdapRegex)
## explain(m)

## Example 2 (Twitter Handle 2 ways)
## Bigger Sub-expressions
twitter <- construct(
  no_at_wrd = "(?<![@\\w])"            %:)%  "Ensure doesn't start with @ or a word",
  at =        "(@)"                    %:)%  "Capture starting with @ symbol",
  handle =    "(([a-z0-9_]{1,15})\\b)"  %:)%  "Any 15 letters, numbers, or underscores"
)

## Smaller Sub-expressions
twitter <- construct(
  no_at_wrd = "(?<![@\\w])"          %:)%  "Ensure doesn't start with @ or a word",
  at =        "(@)"                  %:)%  "Capture starting with @ symbol",

  s_gr1 =     "("                     %:)%  "GROUP 1 START",
      handle =    "([a-z0-9_]{1,15})"       %:)%  "Any 15 letters, numbers, or underscores",
      boundary =  "\\b",
  e_gr1 =     ")"                      %:)%"GROUP 1 END"
)

twitter
unglue(twitter)
comments(twitter)
subs(twitter)
summary(twitter)
test(twitter)
## explain(twitter)

x <- c("@hadley I like #rstats for #ggplot2 work.",
    "Difference between #magrittr and #pipeR, both implement pipeline operators for #rstats:
        http://renkun.me/r/2014/07/26/difference-between-magrittr-and-pipeR.html @timelyportfolio",
    "Slides from great talk: @ramnath_vaidya: Interactive slides from Interactive Visualization
        presentation #user2014. http://ramnathv.github.io/user2014-rcharts/#1",
    "[email protected] is my email",
    "A non valid Twitter is @abcdefghijklmnopqrstuvwxyz"
)

library(qdapRegex)
rm_default(x, pattern = twitter, extract = TRUE)

## Example 3 (Modular Sub-expression Chunks)
combined <- construct(
    twitter = twitter               %:)%"Twitter regex created previously",
    or =      "|"                   %:)%"Join handle regex & hash tag regex",
    hash =    grab("@rm_hash")     %:)%"Twitter hash tag regex"
)

combined
unglue(combined)
comments(combined)
subs(combined)
summary(combined)
test(combined)
## explain(combined)

## Different Structure (no names): Example from Martin Fowler:
## *Note: Fowler argues for improved choices in regex representation
## and names that make the regex functionality more evident, commenting
## only where needed. See:
## browseURL("http://martinfowler.com/bliki/ComposedRegex.html")

pattern <- construct(
    '@"^score',
    '\\s+',
    '(\\d+)'          %:)% 'points',
    '\\s+',
    'for',
    '\\s+',
    '(\\d+)'          %:)% 'number of nights',
    '\\s+',
    'night'           ,
    's?'              %:)% 'optional plural',
    '\\s+',
    'at',
    '\\s+',
    '(.*)'            %:)% 'hotel name',
    '";'
)

summary(pattern)

regexr documentation built on May 29, 2017, 5:57 p.m.