strcapture: Capture String Tokens into a data.frame

strcaptureR Documentation

Capture String Tokens into a data.frame

Description

Given a character vector and a regular expression containing capture expressions, strcapture will extract the captured tokens into a tabular data structure, such as a data.frame, the type and structure of which is specified by a prototype object. The assumption is that the same number of tokens are captured from every input string.

Usage

strcapture(pattern, x, proto, perl = FALSE, useBytes = FALSE)

Arguments

pattern

The regular expression with the capture expressions.

x

A character vector in which to capture the tokens.

proto

A data.frame or S4 object that behaves like one. See details.

perl,useBytes

Arguments passed to regexec.

Details

The proto argument is typically a data.frame, with a column corresponding to each capture expression, in order. The captured character vector is coerced to the type of the column, and the column names are carried over to the return value. Any data in the prototype are ignored. See the examples.

Value

A tabular data structure of the same type as proto, so typically a data.frame, containing a column for each capture expression. The column types and names are inherited from proto. Cases in x that do not match pattern have NA in every column.

See Also

regexec and regmatches for related low-level utilities.

Examples

x <- "chr1:1-1000"
pattern <- "(.*?):([[:digit:]]+)-([[:digit:]]+)"
proto <- data.frame(chr=character(), start=integer(), end=integer())
strcapture(pattern, x, proto)