ore.search: Search for matches to a regular expression

Description Usage Arguments Value Note See Also Examples

Description

Search a character vector for one or more matches to an Oniguruma-compatible regular expression. The result is of class "orematches", for which printing and indexing methods are available.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
ore.search(regex, text, all = FALSE, start = 1L, simplify = TRUE,
  incremental = !all)

is.orematch(x)

## S3 method for class 'orematch'
x[j, k, ...]

## S3 method for class 'orematches'
x[i, j, k, ...]

## S3 method for class 'orematch'
print(x, lines = NULL, context = NULL, width = NULL,
  ...)

## S3 method for class 'orematches'
print(x, ...)

Arguments

regex

A single character string or object of class "ore". In the former case, this will first be passed through ore.

text

A vector of strings to match against, or the result of a call to ore.file to search in a file. In the latter case, match offsets will be relative to the file's encoding.

all

If TRUE, then all matches within each element of text will be found. Otherwise, the search will stop at the first match.

start

An optional vector of offsets (in characters) at which to start searching. Will be recycled to the length of text.

simplify

If TRUE, an object of class "orematch" will be returned if text is of length 1. Otherwise, a list of such objects, with class "orematches", will always be returned.

incremental

If TRUE and the text argument points to a file, the file is read in increasingly large blocks. This can reduce search time in large files.

x

An R object.

j

For indexing, the match number.

k

For indexing, the group number.

...

Ignored.

i

For indexing into an "orematches" object only, the string number.

lines

The maximum number of lines to print. If NULL, this defaults to the value of the "ore.lines" option, or 0 if that is unset or invalid. Zero means no limit.

context

The number of characters of context to include either side of each match. If NULL, this defaults to the value of the "ore.context" option, or 30 if that is unset or invalid.

width

The number of characters in each line of printed output. If NULL, this defaults to the value of the standard "width" option.

Value

For ore.search, an "orematch" object, or a list of the same, each with elements

text

A copy of the text element for the current match.

nMatches

The number of matches found.

offsets

The offsets (in characters) of each match.

byteOffsets

The offsets (in bytes) of each match.

lengths

The lengths (in characters) of each match.

byteLengths

The lengths (in bytes) of each match.

matches

The matched substrings.

groups

Equivalent metadata for each parenthesised subgroup in regex, in a series of matrices. If named groups are present in the regex then dimnames will be set appropriately.

For is.orematch, a logical vector indicating whether the specified object has class "orematch". For extraction with one index, a vector of matched substrings. For extraction with two indices, a vector or matrix of substrings corresponding to captured groups.

Note

Only named *or* unnamed groups will currently be captured, not both. If there are named groups in the pattern, then unnamed groups will be ignored.

By default the print method uses the crayon package (if it is available) to determine whether or not the R terminal supports colour. Alternatively, colour printing may be forced or disabled by setting the "ore.colour" (or "ore.color") option to a logical value.

See Also

ore for creating regex objects; matches and groups for an alternative to indexing for extracting matching substrings.

Examples

1
2
3
4
5
6
7
8
# Pick out pairs of consecutive word characters
match <- ore.search("(\\w)(\\w)", "This is a test", all=TRUE)

# Find the second matched substring ("is", from "This")
match[2]

# Find the content of the second group in the second match ("s")
match[2,2]

ore documentation built on Aug. 30, 2018, 9:05 a.m.