getPatternList: Discover regular expressions from string

Description Usage Arguments Details Value Examples

Description

getPatternList define regular expressions from symbols. For example, the symbol "SEK-1" will generate the regular expression "[a-zA-Z]{3}[\-][0-9]{1}".

Usage

1
2
getPatternList(symbols, skipSinglePattern = TRUE, res = "individual",
  rm.duplicated = FALSE, ignore.case = FALSE, word.boundary = FALSE)

Arguments

symbols

a list of symbols to discover regular expressions

skipSinglePattern

a logical value to select multiple patterns

res

type of regex to be created: generic, individual or restricted. Generic will use the plus sign to represent several occurences (ex: [a-zA-Z]+[0-9]+), the Individual option will use the specific number of occurrences (ex: [a-zA-Z]{2}[0-9]{3}), and the restricted option will use the minimum and maximum occurences for a type of pattern (ex: [a-zA-Z]{2-4}[0-9]{1-}).

rm.duplicated

remove duplicated regular expressions

ignore.case

whether to differentiate lower/upper cases in the patterns. If TRUE, a pattern containing lower and upper case letters will have different expressions, one for lower case (ex: [a-z]) and one for upper case (ex: [A-Z]). If FALSE, a patter with lower and upper case letters will have a combined expression such as [a-zA-Z].

word.boundary

logical value to inform whether to surround the regular expressions with the word boundary character. This will prevent partial matches.

Details

The following characters are handled by this function: " ", "_", ".", "-", "@", ":", "#", "/", "*", "&", "?", ";", "$". This function requires the package "plyr".

Value

A dataframe with list of patterns for the symbols informed

Examples

1
2
3
4
5
6
7
8
9
## getting Patterns List
symbols <- c("p53", "ABP-280", "Hsp90")
patternsList <- getPatternList(symbols,
                               skipSinglePattern = TRUE,
                               res = "individual",
                               rm.duplicated = TRUE,
                               ignore.case = TRUE,
                               word.boundary = FALSE)
patternsList

andreysoares/nlpUtilityBelt documentation built on May 6, 2019, 8:57 p.m.