provigrep: provigrep: progressive case-insensitive value-grep

provigrepR Documentation

provigrep: progressive case-insensitive value-grep

Description

case-insensitive value-grep for a vector of patterns

case-insensitive grep for a vector of patterns

Usage

provigrep(
  patterns,
  x,
  maxValues = NULL,
  sortFunc = c,
  rev = FALSE,
  returnType = c("vector", "list"),
  ignore.case = TRUE,
  value = TRUE,
  ...
)

proigrep(..., value = FALSE)

Arguments

patterns

character vector of regular expression patterns, ultimately passed to base::grep().

x

character vector that is the subject of base::grep().

maxValues

integer or NULL, the maximum matching entries to return per grep pattern. Note that each grep pattern may match multiple values, and values are only returned at most once each, so restricting items returned by one grep pattern may allow an item to be matched by subsequent patterns, see examples. This argument is most commonly used with maxValues=1 which returns only the first matching entry per pattern.

sortFunc

function or NULL, used to sort entries within each set of matching entries. Use NULL to avoid sorting entries.

rev

logical whether to reverse the order of matching entries. Use TRUE if you would like entries matching the patterns to be placed last, and entries not matching the grep patterns to be placed first. This technique is effective at placing "noise names" at the end of a long vector, for example.

returnType

character indicating whether to return a vector or list. A list will be in order of the grep patterns, using empty elements to indicate when no entries matched each pattern. This output is useful when you would like to know which patterns matched specific entries.

ignore.case

logical parameter sent to base::grep(), TRUE runs in case-insensitive mode, as by default.

value

logical indicating whether to return the matched value, or when value=FALSE the index position is returned.

...

additional arguments are passed to vigrep().

Details

Purpose is to provide "progressive vigrep()",which is value-returning, case-insensitive grep, starting with an ordered vector of grep patterns. For example, it returns entries in the order they are matched, by the progressive use of grep patterns.

It is particularly good when using multiple grep patterns, since grep() does not accept multiple patterns as input. This function also only returns the unique matches in the order they were matched, which alleviates the need to run a series of grep() functions and collating their results.

It is mainly to allow for prioritized ordering of matching entries, where one would like certain matching entries first, followed by another set of matching entries, without duplication. For example, one might grep for a few patterns, but want certain pattern hits to be listed first.

See Also

Other jam grep functions: grepls(), igrepHas(), igrepl(), igrep(), unigrep(), unvigrep(), vgrep(), vigrep()

Examples

# a rather comical example
# set up a test set with labels containing several substrings
set.seed(1);
testTerms <- c("robot","tree","dog","mailbox","pizza","noob");
testWords <- pasteByRow(t(combn(testTerms,3)));

# now pull out entries matching substrings in order
provigrep(c("pizza", "dog", "noob", "."), testWords);
# more detail about the sort order is shown with returnType="list"
provigrep(c("pizza", "dog", "noob", "."), testWords, returnType="list");
# rev=TRUE will reverse the order of the list
provigrep(c("pizza", "dog", "noob", "."), testWords, returnType="list", rev=TRUE);
provigrep(c("pizza", "dog", "noob", "."), testWords, rev=TRUE);

# another example showing ordering of duplicated entries
set.seed(1);
x <- paste0(
   sample(letters[c(1,2,2,3,3,3,4,4,4,4)]),
   sample(1:5));
x;
# sort by letter
provigrep(letters[1:4], x)

# show more detail about how the sort is performed
provigrep(letters[1:4], x, returnType="list")

# rev=TRUE will reverse the order of pattern matching
# which is most useful when "." is the last pattern:
provigrep(c(letters[1:3], "."), x, returnType="list")
provigrep(c(letters[1:3], "."), x, returnType="list", rev=TRUE)

# example demonstrating maxValues
# return in list format
provigrep(c("[ABCD]", "[CDEF]", "[FGHI]"), LETTERS, returnType="list")

# maxValues=1
provigrep(c("[ABCD]", "[CDEF]", "[FGHI]"), LETTERS, returnType="list", maxValues=1)
provigrep(c("[ABCD]", "[CDEF]", "[FGHI]"), LETTERS, returnType="list", maxValues=1, value=FALSE)
proigrep(c("[ABCD]", "[CDEF]", "[FGHI]"), LETTERS, maxValues=1)


jmw86069/jamba documentation built on Oct. 9, 2024, 10:52 a.m.