fstrcapture: Capture string tokens into a data frame

View source: R/fstrcapture.R

fstrcaptureR Documentation

Capture string tokens into a data frame

Description

fstrcapture() is a more efficient alternative for strcapture() when using Perl-compatible regular expressions. It is underpinned by the regexpr() function. Whilst fstrcapture() only returns the first occurrence of the captures in a string, gstrcapture(), built upon gregexpr(), will return all.

Usage

fstrcapture(x, pattern, proto)

gstrcapture(x, pattern, proto)

Arguments

x

A character vector in which to capture the tokens.

pattern

The regular expression with the capture expressions.

proto

A data.frame or S4 object that behaves like one. See details.

Value

A tabular data structure of the same type as proto, so typically a data.frame, containing a column for each capture expression. The column types are inherited from proto, as are the names unless the captures themselves are named (in which case these are prioritised). Cases in x that do not match the pattern have NA in every column. For gstrcapture() there is an additional column, string_id, which links the output to the relevant element of the input vector.

See Also

strcapture().

Examples


# from regexpr example -------------------------------------------------

# if named capture then pass names on irrespective of proto
notables <- c("  Ben Franklin and Jefferson Davis", "\tMillard Fillmore")
pattern <- "(?<first>[[:upper:]][[:lower:]]+) (?<last>[[:upper:]][[:lower:]]+)"
proto <- data.frame(a="", b="")
fstrcapture(notables, pattern, proto)
gstrcapture(notables, pattern, proto)

# from strcapture example ----------------------------------------------
# if unnamed capture then proto names used
x <- "chr1:1-1000"
pattern <- "(.*?):([[:digit:]]+)-([[:digit:]]+)"
proto <- data.frame(chr=character(), start=integer(), end=integer())
fstrcapture(x, pattern, proto)

# if no proto supplied then all captures treated as character
str(fstrcapture(x, pattern))
str(fstrcapture(x, pattern, proto))


ympes documentation built on April 15, 2025, 1:17 a.m.