weightedMedian: Weighted Median Value In matrixStats: Methods that apply to rows and columns of a matrix

Description

Computes a weighted median of a numeric vector.

Usage

 ```1 2``` ```weightedMedian(x, w, na.rm=NA, interpolate=is.null(ties), ties=NULL, method=c("quick", "shell"), ...) ```

Arguments

 `x` a `numeric` `vector` containing the values whose weighted median is to be computed. `w` a vector of weights the same length as `x` giving the weights to use for each element of `x`. Negative weights are treated as zero weights. Default value is equal weight to all values. `na.rm` a logical value indicating whether `NA` values in `x` should be stripped before the computation proceeds, or not. If `NA`, no check at all for `NA`s is done. Default value is `NA` (for effiency). `interpolate` If `TRUE`, linear interpolation is used to get a consistent estimate of the weighted median. `ties` If `interpolate == FALSE`, a character string specifying how to solve ties between two `x`'s that are satisfying the weighted median criteria. Note that at most two values can satisfy the criteria. When `ties` is `"min"`, the smaller value of the two is returned and when it is `"max"`, the larger value is returned. If `ties` is `"mean"`, the mean of the two values is returned and if it is `"both"`, both values are returned. Finally, if `ties` is `"weighted"` (or `NULL`) a weighted average of the two are returned, where the weights are weights of all values `x[i] <= x[k]` and `x[i] >= x[k]`, respectively. `method` If `"shell"`, then `order()` is used and when `method="quick"`, then internal `qsort()` is used. `...` Not used.

Details

For the `n` elements `x = c(x[1], x[2], ..., x[n])` with positive weights `w = c(w[1], w[2], ..., w[n])` such that `sum(w) = S`, the weighted median is defined as the element `x[k]` for which the total weight of all elements `x[i] < x[k]` is less or equal to `S/2` and for which the total weight of all elements `x[i] > x[k]` is less or equal to `S/2` (c.f. [1]).

If `w` is missing then all elements of `x` are given the same positive weight. If all weights are zero, `NA` is returned.

If one or more weights are `Inf`, it is the same as these weights have the same weight and the others has zero. This makes things easier for cases where the weights are result of a division with zero. In this case `median()` is used internally.

When all the weights are the same (after values with weight zero are excluded and `Inf`'s are taken care of), `median` is used internally.

The weighted median solves the following optimization problem:

α^* = \arg_α \min ∑_{k=1}{K} w_k |x_k-α|

where x=(x_1,x_2,…,x_K) are scalars and w=(w_1,w_2,…,w_K) are the corresponding "weights" for each individual x value.

Value

Returns a `numeric` scalar.

Benchmarks

When implementing this function speed has been highly prioritized and it also making use of the internal quick sort algorithm (from R v1.5.0). The result is that `weightedMedian(x)` is about half as slow as `median(x)`.

Initial test also indicates that `method="shell"`, which uses `order()` is slower than `method="quick"`, which uses internal `qsort()`. Non-weighted median can use partial sorting which is faster because all values do not have to be sorted.

See examples below for some simple benchmarking tests.

Author(s)

Henrik Bengtsson and Ola Hössjer, Centre for Mathematical Sciences, Lund University. Thanks to Roger Koenker, Econometrics, University of Illinois, for the initial ideas.

References

[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, The MIT Press, Massachusetts Institute of Technology, 1989.

`median`, `mean`() and `weighted.mean`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55``` ```x <- 1:10 n <- length(x) m1 <- median(x) # 5.5 m2 <- weightedMedian(x) # 5.5 stopifnot(identical(m1, m2)) w <- rep(1, n) m1 <- weightedMedian(x, w) # 5.5 (default) m2 <- weightedMedian(x, ties="weighted") # 5.5 (default) m3 <- weightedMedian(x, ties="min") # 5 m4 <- weightedMedian(x, ties="max") # 6 stopifnot(identical(m1,m2)) # Pull the median towards zero w[1] <- 5 m1 <- weightedMedian(x, w) # 3.5 y <- c(rep(0,w[1]), x[-1]) # Only possible for integer weights m2 <- median(y) # 3.5 stopifnot(identical(m1,m2)) # Put even more weight on the zero w[1] <- 8.5 weightedMedian(x, w) # 2 # All weight on the first value w[1] <- Inf weightedMedian(x, w) # 1 # All weight on the last value w[1] <- 1 w[n] <- Inf weightedMedian(x, w) # 10 # All weights set to zero w <- rep(0, n) weightedMedian(x, w) # NA # Simple benchmarking bench <- function(N=1e5, K=10) { x <- rnorm(N) t <- c() t[1] <- system.time(for (k in 1:K) median(x))[3] t[2] <- system.time(for (k in 1:K) weightedMedian(x, method="quick"))[3] t[3] <- system.time(for (k in 1:K) weightedMedian(x, method="shell"))[3] t <- t / t[1] t[4] <- t[2]/t[3] names(t) <- c("median", "wMed-quick", "wMed-shell", "quick/shell") t } print(bench(N= 5, K=500)) print(bench(N= 50, K=500)) print(bench(N=200, K=200)) print(bench(N=1e5, K=5)) ```