Given a vector of non-decreasing breakpoints in
vec, find the
interval containing each element of
x; i.e., if
i <- findInterval(x,v), for each index
v[i[j]] ≤ x[j] < v[i[j] + 1]
where v := - Inf,
v[N+1] := + Inf, and
N <- length(v).
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments
numeric, sorted (weakly) increasingly, of length
logical; if true, the rightmost interval,
logical; if true, the returned indices are coerced
logical; if true all the intervals are open at left
and closed at right; in the formulas below, ≤ should be
swapped with < (and > with ≥), and
findInterval finds the index of one vector
vec, where the latter must be non-decreasing. Where
this is trivial, equivalent to
apply( outer(x, vec, ">="), 1, sum),
as a matter of fact, the internal algorithm uses interval search
ensuring O(n * log(N)) complexity where
n <- length(x) (and
N <- length(vec)). For (almost)
x, it will be even faster, basically O(n).
This is the same computation as for the empirical distribution
function, and indeed,
findInterval(t, sort(X)) is
identical to n * Fn(t;
X,..,X[n]) where Fn is the empirical distribution
function of X,..,X[n].
rightmost.closed = TRUE, the result for
x[j] = vec[N]
( = max(vec)), is
N - 1 as for all other
values in the last interval.
left.open = TRUE is occasionally useful, e.g., for survival data.
For (anti-)symmetry reasons, it is equivalent to using
“mirrored” data, i.e., the following is always true:
1 2 3 4
N <- length(vec) as above.
vector of length
length(x) with values in
N <- length(vec), or values coerced to
1:(N-1) if and only if
all.inside = TRUE (equivalently coercing all
x values inside the intervals). Note that
Inf values are allowed in
approx(*, method = "constant") which is a
computing the empirical distribution function which is (up to a factor
of n) also basically the same as
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x <- 2:18 v <- c(5, 10, 15) # create two bins [5,10) and [10,15) cbind(x, findInterval(x, v)) N <- 100 X <- sort(round(stats::rt(N, df = 2), 2)) tt <- c(-100, seq(-2, 2, len = 201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X) ## 'left.open = TRUE' means "mirroring" : N <- length(v) stopifnot(identical( findInterval( x, v, left.open=TRUE) , N - findInterval(-x, -v[N:1])))
We want your feedback!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.