Extended gregexpr with substring retrieval

Share:

Description

An extension of the base function gregexpr enabling retrieval of the matching substrings.

Usage

1
2
gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE,
  useBytes = FALSE, extract = FALSE)

Arguments

pattern

Character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are not allowed.

text

A character vector where matches are sought, or an object which can be coerced by as.character to a character vector.

ignore.case

If FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.

perl

Logical. Should perl-compatible regexps be used? Has priority over extended.

fixed

Logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.

useBytes

Logical. If TRUE the matching is done byte-by-byte rather than character-by-character. See base::gregexpr for details.

extract

Logical indicating if matching substrings should be extracted and returned.

Details

Extended version of gregexpr that enables the return of the substrings matching the pattern. The last argument extract is the only difference to base::gregexpr. The default behaviour is identical to base::gregexpr, but setting extract=TRUE means the matching substrings are returned.

Value

It will either return what the base::gregexpr would (extract=FALSE) or a list of substrings matching the pattern (extract=TRUE). There is one list element for each string in text, and each list element contains a character vector of all matching substrings in the corresponding entry of text.

Author(s)

Lars Snipen and Kristian Liland.

See Also

gregexpr

Examples

1
2
sequences<-c("ACATGTCATGTCC","CTTGTATGCTG")
gregexpr("ATG",sequences,extract=TRUE)