lex: Break a string into labelled tokens based upon a set of...

Description Usage Arguments Value Examples

Description

Break a string into labelled tokens based upon a set of patterns

Usage

1

Arguments

text

a single character string

patterns

a named vector of character strings. Each string represents a regex to match a token, and the name of the string is the label for the token. If the regex contains a captured group it will be left as is, otherwise the whole regex will be turned into a captured group. The patterns are used in order such that an early match takes precedence over any later match.

debug

print more debugging information about the matching. default: FALSE

Value

a named character vector with the names representing the token type and the contents representing the tokens

Examples

1
lex("hello there 123.45", patterns=c(number=pattern_number, word="\\w+", whitespace="\\s+"))

coolbutuseless/minilexer documentation built on May 14, 2019, 6:09 a.m.