View source: R/my_RcppExports.R
fquantile | R Documentation |
A faster alternative to quantile
(written fully in C), that supports sampling weights, and can also quickly compute quantiles from an ordering vector (e.g. order(x)
). frange
provides a fast alternative to range
.
fquantile(x, probs = c(0, 0.25, 0.5, 0.75, 1), w = NULL,
o = if(length(x) > 1e5L && length(probs) > log(length(x)))
radixorder(x) else NULL,
na.rm = .op[["na.rm"]], type = 7L, names = TRUE,
check.o = is.null(attr(o, "sorted")))
# Programmers version: no names, intelligent defaults, or checks
.quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1), w = NULL, o = NULL,
na.rm = TRUE, type = 7L, names = FALSE, check.o = FALSE)
# Fast range (min and max)
frange(x, na.rm = .op[["na.rm"]], finite = FALSE)
.range(x, na.rm = TRUE, finite = FALSE)
x |
a numeric or integer vector. |
probs |
numeric vector of probabilities with values in [0,1]. |
w |
a numeric vector of sampling weights. Missing weights are only supported if |
o |
integer. An vector giving the ordering of the elements in |
na.rm |
logical. Remove missing values, default |
finite |
logical. Omit all non-finite values. |
type |
integer. Quantile types 5-9. See |
names |
logical. Generates names of the form |
check.o |
logical. If |
fquantile
is implemented using a quickselect algorithm in C, inspired by data.table's gmedian
. The algorithm is applied incrementally to different sections of the array to find individual quantiles. If many quantile probabilities are requested, sorting the whole array with the fast radixorder
algorithm is more efficient. The default threshold for this (length(x) > 1e5L && length(probs) > log(length(x))
) is conservative, given that quickselect is generally more efficient on longitudinal data with similar values repeated by groups. With random data, my investigations yield that a threshold of length(probs) > log10(length(x))
would be more appropriate.
Weighted quantile estimation, in a nutshell, is done by internally calling radixorder(x)
(unless o
is supplied), and summing the weights in order until the lowest required order statistic j
is found, which corresponds to exceeding a target sum of weights that is a function of the probability p
, the quantile method (see quantile
), the total sum of weights, and the smallest (non-zero) weight. For quantile type 7 the target sum is sumwp = (sum(w) - min(w)) * p
(resembling (n - 1) * p
in the unweighted case). Then, a continuous index h
in [0, 1] is determined as one minus the difference between the sum of weights associated with j
and the target sum, divided by the weight of element j
, that is h = 1 - (sumwj - sumwp) / w[j]
. A weighted quantile can then be computed as a weighted average of 2 order statistics, exactly as in the unweighted case: WQ[i](p) = (1 - h) x[j] + h x[j+1]
. If the order statistic j+1
has a zero weight, j+2
is taken (or j+3
if j+2
also has zero weight etc..). The Examples section provides a demonstration in R that is roughly equivalent to the algorithm just outlined.
frange
is considerably more efficient than range
, which calls both min
and max
, and thus requires 2 full passes instead of 1 required by frange
. If only probabilities 0
and 1
are requested, fquantile
internally calls frange
.
A vector of quantiles. If names = TRUE
, fquantile
generates names as paste0(round(probs * 100, 1), "%")
(in C).
fnth
, Fast Statistical Functions, Collapse Overview
frange(mtcars$mpg)
## Checking computational equivalence to stats::quantile()
w = alloc(abs(rnorm(1)), 32)
o = radixorder(mtcars$mpg)
for (i in 5:9) print(all_obj_equal(fquantile(mtcars$mpg, type = i),
fquantile(mtcars$mpg, type = i, w = w),
fquantile(mtcars$mpg, type = i, o = o),
fquantile(mtcars$mpg, type = i, w = w, o = o),
quantile(mtcars$mpg, type = i)))
## Demonstaration: weighted quantiles type 7 in R
wquantile7R <- function(x, w, probs = c(0.25, 0.5, 0.75), na.rm = TRUE, names = TRUE) {
if(na.rm && anyNA(x)) { # Removing missing values (only in x)
cc = whichNA(x, invert = TRUE) # The C code first calls radixorder(x), which places
x = x[cc]; w = w[cc] # missing values last, so removing = early termination
}
if(anyv(w, 0)) { # Removing zero weights
nzw = whichv(w, 0, invert = TRUE) # In C, skipping zero weight order statistics is built
x = x[nzw]; w = w[nzw] # into the quantile algorithm, as outlined above
}
o = radixorder(x) # Ordering
wo = w[o]
w_cs = cumsum(wo) # Cumulative sum
sumwp = sum(w) # Computing sum(w) - min(w)
sumwp = sumwp - min(w)
sumwp = sumwp * probs # Target sums of weights for quantile type 7
res = sapply(sumwp, function(tsump) {
j = which.max(w_cs > tsump) # Lower order statistic
hl = (w_cs[j] - tsump) / wo[j] # Index weight of x[j] (h = 1 - hl)
hl * x[o[j]] + (1 - hl) * x[o[j+1L]] # Weighted quantile
})
if(names) names(res) = paste0(as.integer(probs * 100), "%")
res
} # Note: doesn't work for min and max. Overall the C code is significantly more rigorous.
wquantile7R(mtcars$mpg, mtcars$wt)
all.equal(wquantile7R(mtcars$mpg, mtcars$wt),
fquantile(mtcars$mpg, c(0.25, 0.5, 0.75), mtcars$wt))
## Efficient grouped quantile estimation: use .quantile for less call overhead
BY(mtcars$mpg, mtcars$cyl, .quantile, names = TRUE, expand.wide = TRUE)
BY(mtcars, mtcars$cyl, .quantile, names = TRUE)
library(magrittr)
mtcars |> fgroup_by(cyl) |> BY(.quantile)
## With weights
BY(mtcars$mpg, mtcars$cyl, .quantile, w = mtcars$wt, names = TRUE, expand.wide = TRUE)
BY(mtcars, mtcars$cyl, .quantile, w = mtcars$wt, names = TRUE)
mtcars |> fgroup_by(cyl) |> fselect(-wt) |> BY(.quantile, w = mtcars$wt)
mtcars |> fgroup_by(cyl) |> fsummarise(across(-wt, .quantile, w = wt))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.