ore_search: Search for matches to a regular expression
In ore: An R Interface to the Onigmo Regular Expression Library

ore_search

R Documentation

Search for matches to a regular expression

Description

Search a character vector, or the content of a file or connection, for one or more matches to an Oniguruma-compatible regular expression. Printing and indexing methods are available for the results. ore_match is an alias for ore_search.

Usage

ore_search(regex, text, all = FALSE, start = 1L, simplify = TRUE,
  incremental = !all)

is_orematch(x)

## S3 method for class 'orematch'
x[j, k, ...]

## S3 method for class 'orematches'
x[i, j, k, ...]

## S3 method for class 'orematch'
print(x, lines = getOption("ore.lines", 0L),
  context = getOption("ore.context", 30L), width = getOption("width", 80L),
  ...)

## S3 method for class 'orematches'
print(x, lines = getOption("ore.lines", 0L),
  simplify = TRUE, ...)

Arguments

`regex`	A single character string or object of class `"ore"`. In the former case, this will first be passed through `ore`.
`text`	A vector of strings to match against, or a connection, or the result of a call to `ore_file` to search in a file. In the latter case, match offsets will be relative to the file's encoding.
`all`	If `TRUE`, then all matches within each element of `text` will be found. Otherwise, the search will stop at the first match.
`start`	An optional vector of offsets (in characters) at which to start searching. Will be recycled to the length of `text`.
`simplify`	If `TRUE`, an object of class `"orematch"` will be returned if `text` is of length 1. Otherwise, a list of such objects, with class `"orematches"`, will always be returned. When printing `"orematches"` objects, this controls whether or not to omit nonmatching elements from the output.
`incremental`	If `TRUE` and the `text` argument points to a file, the file is read in increasingly large blocks. This can reduce search time in large files.
`x`	An R object.
`j`	For indexing, the match number.
`k`	For indexing, the group number.
`...`	For `print.orematches`, additional arguments to be passed through to `print.orematch`.
`i`	For indexing into an `"orematches"` object only, the string number.
`lines`	The maximum number of lines to print. The default is zero, meaning no limit. For `"orematches"` objects this is split evenly between the elements printed.
`context`	The number of characters of context to include either side of each match.
`width`	The number of characters in each line of printed output.

Value

For ore_search, an "orematch" object, or a list of the same, each with elements

text: A copy of the text element for the current match, if it was a character vector; otherwise a single string with the content retrieved from the file or connection. If the source was a binary file (from ore_file(..., binary=TRUE)) then this element will be NULL.
nMatches: The number of matches found.
offsets: The offsets (in characters) of each match.
byteOffsets: The offsets (in bytes) of each match.
lengths: The lengths (in characters) of each match.
byteLengths: The lengths (in bytes) of each match.
matches: The matched substrings.
groups: Equivalent metadata for each parenthesised subgroup in regex, in a series of matrices. If named groups are present in the regex then dimnames will be set appropriately.

For is_orematch, a logical vector indicating whether the specified object has class "orematch". For extraction with one index, a vector of matched substrings. For extraction with two indices, a vector or matrix of substrings corresponding to captured groups.

Note

Only named *or* unnamed groups will currently be captured, not both. If there are named groups in the pattern, then unnamed groups will be ignored.

By default the print method uses the crayon package (if it is available) to determine whether or not the R terminal supports colour. Alternatively, colour printing may be forced or disabled by setting the "ore.colour" (or "ore.color") option to a logical value.

Examples

# Pick out pairs of consecutive word characters
match <- ore_search("(\\w)(\\w)", "This is a test", all=TRUE)

# Find the second matched substring ("is", from "This")
match[2]

# Find the content of the second group in the second match ("s")
match[2,2]

ore documentation built on April 4, 2025, 4:42 a.m.