ore_search: Search for matches to a regular expression

ore_searchR Documentation

Search for matches to a regular expression

Description

Search a character vector, or the content of a file or connection, for one or more matches to an Oniguruma-compatible regular expression. Printing and indexing methods are available for the results. ore_match is an alias for ore_search.

Usage

ore_search(regex, text, all = FALSE, start = 1L, simplify = TRUE,
  incremental = !all)

is_orematch(x)

## S3 method for class 'orematch'
x[j, k, ...]

## S3 method for class 'orematches'
x[i, j, k, ...]

## S3 method for class 'orematch'
print(x, lines = getOption("ore.lines", 0L),
  context = getOption("ore.context", 30L), width = getOption("width", 80L),
  ...)

## S3 method for class 'orematches'
print(x, lines = getOption("ore.lines", 0L), simplify = TRUE, ...)

Arguments

regex

A single character string or object of class "ore". In the former case, this will first be passed through ore.

text

A vector of strings to match against, or a connection, or the result of a call to ore_file to search in a file. In the latter case, match offsets will be relative to the file's encoding.

all

If TRUE, then all matches within each element of text will be found. Otherwise, the search will stop at the first match.

start

An optional vector of offsets (in characters) at which to start searching. Will be recycled to the length of text.

simplify

If TRUE, an object of class "orematch" will be returned if text is of length 1. Otherwise, a list of such objects, with class "orematches", will always be returned. When printing "orematches" objects, this controls whether or not to omit nonmatching elements from the output.

incremental

If TRUE and the text argument points to a file, the file is read in increasingly large blocks. This can reduce search time in large files.

x

An R object.

j

For indexing, the match number.

k

For indexing, the group number.

...

For print.orematches, additional arguments to be passed through to print.orematch.

i

For indexing into an "orematches" object only, the string number.

lines

The maximum number of lines to print. The default is zero, meaning no limit. For "orematches" objects this is split evenly between the elements printed.

context

The number of characters of context to include either side of each match.

width

The number of characters in each line of printed output.

Value

For ore_search, an "orematch" object, or a list of the same, each with elements

text

A copy of the text element for the current match, if it was a character vector; otherwise a single string with the content retrieved from the file or connection. If the source was a binary file (from ore_file(..., binary=TRUE)) then this element will be NULL.

nMatches

The number of matches found.

offsets

The offsets (in characters) of each match.

byteOffsets

The offsets (in bytes) of each match.

lengths

The lengths (in characters) of each match.

byteLengths

The lengths (in bytes) of each match.

matches

The matched substrings.

groups

Equivalent metadata for each parenthesised subgroup in regex, in a series of matrices. If named groups are present in the regex then dimnames will be set appropriately.

For is_orematch, a logical vector indicating whether the specified object has class "orematch". For extraction with one index, a vector of matched substrings. For extraction with two indices, a vector or matrix of substrings corresponding to captured groups.

Note

Only named *or* unnamed groups will currently be captured, not both. If there are named groups in the pattern, then unnamed groups will be ignored.

By default the print method uses the crayon package (if it is available) to determine whether or not the R terminal supports colour. Alternatively, colour printing may be forced or disabled by setting the "ore.colour" (or "ore.color") option to a logical value.

See Also

ore for creating regex objects; matches and groups for an alternative to indexing for extracting matching substrings.

Examples

# Pick out pairs of consecutive word characters
match <- ore_search("(\\w)(\\w)", "This is a test", all=TRUE)

# Find the second matched substring ("is", from "This")
match[2]

# Find the content of the second group in the second match ("s")
match[2,2]

ore documentation built on Jan. 17, 2023, 1:10 a.m.