View source: R/weightedMedian.R
weightedMedian | R Documentation |
Computes a weighted median of a numeric vector.
weightedMedian(x, w = NULL, idxs = NULL, na.rm = FALSE,
interpolate = is.null(ties), ties = NULL, ...)
x |
|
w |
a vector of weights the same length as |
idxs |
A |
na.rm |
a logical value indicating whether |
interpolate |
If |
ties |
If |
... |
Not used. |
Returns a numeric
scalar.
For the n
elements x = c(x[1], x[2], ..., x[n])
with positive
weights w = c(w[1], w[2], ..., w[n])
such that sum(w) = S
, the
weighted median is defined as the element x[k]
for which the
total weight of all elements x[i] < x[k]
is less or equal to
S/2
and for which the total weight of all elements x[i] > x[k]
is less or equal to S/2
(c.f. [1]).
When using linear interpolation, the weighted mean of x[k-1]
and
x[k]
with weights S[k-1]
and S[k]
corresponding to the
cumulative weights of those two elements is used as an estimate.
If w
is missing then all elements of x
are given the same
positive weight. If all weights are zero, NA_real_
is
returned.
If one or more weights are Inf
, it is the same as these weights have
the same weight and the others have zero. This makes things easier for cases
where the weights are result of a division with zero.
If there are missing values in w
that are part of the calculation
(after subsetting and dropping missing values in x
), then the final
result is always NA
of the same type as x
.
The weighted median solves the following optimization problem:
\alpha^* = \arg_\alpha \min \sum_{i = 1}^{n} w_i |x_i-\alpha|
where
x = (x_1, x_2, \ldots, x_n)
are scalars and
w = (w_1, w_2, \ldots, w_n)
are the corresponding "weights" for each
individual x
value.
Henrik Bengtsson and Ola Hossjer, Centre for Mathematical Sciences, Lund University. Thanks to Roger Koenker, Econometrics, University of Illinois, for the initial ideas.
[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, The MIT Press, Massachusetts Institute of Technology, 1989.
median
, mean
() and
weightedMean
().
x <- 1:10
n <- length(x)
m1 <- median(x) # 5.5
m2 <- weightedMedian(x) # 5.5
stopifnot(identical(m1, m2))
w <- rep(1, times = n)
m1 <- weightedMedian(x, w) # 5.5 (default)
m2 <- weightedMedian(x, ties = "weighted") # 5.5 (default)
m3 <- weightedMedian(x, ties = "min") # 5
m4 <- weightedMedian(x, ties = "max") # 6
stopifnot(identical(m1, m2))
# Pull the median towards zero
w[1] <- 5
m1 <- weightedMedian(x, w) # 3.5
y <- c(rep(0, times = w[1]), x[-1]) # Only possible for integer weights
m2 <- median(y) # 3.5
stopifnot(identical(m1, m2))
# Put even more weight on the zero
w[1] <- 8.5
weightedMedian(x, w) # 2
# All weight on the first value
w[1] <- Inf
weightedMedian(x, w) # 1
# All weight on the last value
w[1] <- 1
w[n] <- Inf
weightedMedian(x, w) # 10
# All weights set to zero
w <- rep(0, times = n)
weightedMedian(x, w) # NA
# Simple benchmarking
bench <- function(N = 1e5, K = 10) {
x <- rnorm(N)
gc()
t <- c()
t[1] <- system.time(for (k in 1:K) median(x))[3]
t[2] <- system.time(for (k in 1:K) weightedMedian(x))[3]
t <- t / t[1]
names(t) <- c("median", "weightedMedian")
t
}
print(bench(N = 5, K = 100))
print(bench(N = 50, K = 100))
print(bench(N = 200, K = 100))
print(bench(N = 1000, K = 100))
print(bench(N = 10e3, K = 20))
print(bench(N = 100e3, K = 20))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.