vec_locate_matches | R Documentation |
vec_locate_matches()
is a more flexible version of vec_match()
used to
identify locations where each value of needles
matches one or multiple
values in haystack
. Unlike vec_match()
, vec_locate_matches()
returns
all matches by default, and can match on binary conditions other than
equality, such as >
, >=
, <
, and <=
.
vec_locate_matches(
needles,
haystack,
...,
condition = "==",
filter = "none",
incomplete = "compare",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none",
nan_distinct = FALSE,
chr_proxy_collate = NULL,
needles_arg = "needles",
haystack_arg = "haystack",
error_call = current_env()
)
needles , haystack |
Vectors used for matching.
Prior to comparison, |
... |
These dots are for future extensions and must be empty. |
condition |
Condition controlling how
|
filter |
Filter to be applied to the matched results.
Filters don't have any effect on A filter can return multiple haystack matches for a particular needle
if the maximum or minimum haystack value is duplicated in |
incomplete |
Handling of missing and incomplete
values in
|
no_match |
Handling of
|
remaining |
Handling of
|
multiple |
Handling of
|
relationship |
Handling of the expected relationship between
|
nan_distinct |
A single logical specifying whether or not |
chr_proxy_collate |
A function generating an alternate representation of character vectors to use for collation, often used for locale-aware ordering.
For data frames, Common transformation functions include: |
needles_arg , haystack_arg |
Argument tags for |
error_call |
The execution environment of a currently
running function, e.g. |
vec_match()
is identical to (but often slightly faster than):
vec_locate_matches( needles, haystack, condition = "==", multiple = "first", nan_distinct = TRUE )
vec_locate_matches()
is extremely similar to a SQL join between needles
and haystack
, with the default being most similar to a left join.
Be very careful when specifying match condition
s. If a condition is
misspecified, it is very easy to accidentally generate an exponentially
large number of matches.
A two column data frame containing the locations of the matches.
needles
is an integer vector containing the location of
the needle currently being matched.
haystack
is an integer vector containing the location of the
corresponding match in the haystack for the current needle.
vec_locate_matches()
vec_order_radix()
vec_detect_complete()
x <- c(1, 2, NA, 3, NaN)
y <- c(2, 1, 4, NA, 1, 2, NaN)
# By default, for each value of `x`, all matching locations in `y` are
# returned
matches <- vec_locate_matches(x, y)
matches
# The result can be used to slice the inputs to align them
data_frame(
x = vec_slice(x, matches$needles),
y = vec_slice(y, matches$haystack)
)
# If multiple matches are present, control which is returned with `multiple`
vec_locate_matches(x, y, multiple = "first")
vec_locate_matches(x, y, multiple = "last")
vec_locate_matches(x, y, multiple = "any")
# Use `relationship` to add constraints and error on multiple matches if
# they aren't expected
try(vec_locate_matches(x, y, relationship = "one-to-one"))
# In this case, the `NA` in `y` matches two rows in `x`
try(vec_locate_matches(x, y, relationship = "one-to-many"))
# By default, `NA` is treated as being identical to `NaN`.
# Using `nan_distinct = TRUE` treats `NA` and `NaN` as different values, so
# `NA` can only match `NA`, and `NaN` can only match `NaN`.
vec_locate_matches(x, y, nan_distinct = TRUE)
# If you never want missing values to match, set `incomplete = NA` to return
# `NA` in the `haystack` column anytime there was an incomplete value
# in `needles`.
vec_locate_matches(x, y, incomplete = NA)
# Using `incomplete = NA` allows us to enforce the one-to-many relationship
# that we couldn't before
vec_locate_matches(x, y, relationship = "one-to-many", incomplete = NA)
# `no_match` allows you to specify the returned value for a needle with
# zero matches. Note that this is different from an incomplete value,
# so specifying `no_match` allows you to differentiate between incomplete
# values and unmatched values.
vec_locate_matches(x, y, incomplete = NA, no_match = 0L)
# If you want to require that every `needle` has at least 1 match, set
# `no_match` to `"error"`:
try(vec_locate_matches(x, y, incomplete = NA, no_match = "error"))
# By default, `vec_locate_matches()` detects equality between `needles` and
# `haystack`. Using `condition`, you can detect where an inequality holds
# true instead. For example, to find every location where `x[[i]] >= y`:
matches <- vec_locate_matches(x, y, condition = ">=")
data_frame(
x = vec_slice(x, matches$needles),
y = vec_slice(y, matches$haystack)
)
# You can limit which matches are returned with a `filter`. For example,
# with the above example you can filter the matches returned by `x[[i]] >= y`
# down to only the ones containing the maximum `y` value of those matches.
matches <- vec_locate_matches(x, y, condition = ">=", filter = "max")
# Here, the matches for the `3` needle value have been filtered down to
# only include the maximum haystack value of those matches, `2`. This is
# often referred to as a rolling join.
data_frame(
x = vec_slice(x, matches$needles),
y = vec_slice(y, matches$haystack)
)
# In the very rare case that you need to generate locations for a
# cross match, where every value of `x` is forced to match every
# value of `y` regardless of what the actual values are, you can
# replace `x` and `y` with integer vectors of the same size that contain
# a single value and match on those instead.
x_proxy <- vec_rep(1L, vec_size(x))
y_proxy <- vec_rep(1L, vec_size(y))
nrow(vec_locate_matches(x_proxy, y_proxy))
vec_size(x) * vec_size(y)
# By default, missing values will match other missing values when using
# `==`, `>=`, or `<=` conditions, but not when using `>` or `<` conditions.
# This is similar to how `vec_compare(x, y, na_equal = TRUE)` works.
x <- c(1, NA)
y <- c(NA, 2)
vec_locate_matches(x, y, condition = "<=")
vec_locate_matches(x, y, condition = "<")
# You can force missing values to match regardless of the `condition`
# by using `incomplete = "match"`
vec_locate_matches(x, y, condition = "<", incomplete = "match")
# You can also use data frames for `needles` and `haystack`. The
# `condition` will be recycled to the number of columns in `needles`, or
# you can specify varying conditions per column. In this example, we take
# a vector of date `values` and find all locations where each value is
# between lower and upper bounds specified by the `haystack`.
values <- as.Date("2019-01-01") + 0:9
needles <- data_frame(lower = values, upper = values)
set.seed(123)
lower <- as.Date("2019-01-01") + sample(10, 10, replace = TRUE)
upper <- lower + sample(3, 10, replace = TRUE)
haystack <- data_frame(lower = lower, upper = upper)
# (values >= lower) & (values <= upper)
matches <- vec_locate_matches(needles, haystack, condition = c(">=", "<="))
data_frame(
lower = vec_slice(lower, matches$haystack),
value = vec_slice(values, matches$needle),
upper = vec_slice(upper, matches$haystack)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.