Description Usage Arguments Details Value Benchmarks Author(s) References See Also Examples

View source: R/weightedMedian.R

Computes a weighted median of a numeric vector.

1 2 |

`x` |
a |

`w` |
a vector of weights the same length as |

`na.rm` |
a logical value indicating whether |

`interpolate` |
If |

`ties` |
If |

`method` |
If |

`...` |
Not used. |

For the `n`

elements `x = c(x[1], x[2], ..., x[n])`

with positive
weights `w = c(w[1], w[2], ..., w[n])`

such that `sum(w) = S`

,
the *weighted median* is defined as the element `x[k]`

for which
the total weight of all elements `x[i] < x[k]`

is less or equal to
`S/2`

and for which the total weight of all elements `x[i] > x[k]`

is less or equal to `S/2`

(c.f. [1]).

If `w`

is missing then all elements of `x`

are given the same
positive weight. If all weights are zero, `NA`

is returned.

If one or more weights are `Inf`

, it is the same as these weights
have the same weight and the others has zero. This makes things easier for
cases where the weights are result of a division with zero. In this case
`median()`

is used internally.

When all the weights are the same (after values with weight zero are excluded
and `Inf`

's are taken care of), `median`

is used internally.

The weighted median solves the following optimization problem:

*α^* = \arg_α \min ∑_{k=1}{K} w_k |x_k-α|*

where *x=(x_1,x_2,…,x_K)* are scalars and
*w=(w_1,w_2,…,w_K)* are the corresponding "weights" for
each individual *x* value.

Returns a `numeric`

scalar.

When implementing this function speed has been highly prioritized and
it also making use of the internal quick sort algorithm (from **R** v1.5.0).
The result is that `weightedMedian(x)`

is about half as slow as
`median(x)`

.

Initial test also indicates that `method="shell"`

, which uses
`order()`

is slower than `method="quick"`

, which uses internal
`qsort()`

. Non-weighted median can use partial sorting which is
faster because all values do not have to be sorted.

See examples below for some simple benchmarking tests.

Henrik Bengtsson and Ola Hössjer, Centre for Mathematical Sciences, Lund University. Thanks to Roger Koenker, Econometrics, University of Illinois, for the initial ideas.

[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, The MIT Press, Massachusetts Institute of Technology, 1989.

`median`

, `mean`

() and `weighted.mean`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | ```
x <- 1:10
n <- length(x)
m1 <- median(x) # 5.5
m2 <- weightedMedian(x) # 5.5
stopifnot(identical(m1, m2))
w <- rep(1, n)
m1 <- weightedMedian(x, w) # 5.5 (default)
m2 <- weightedMedian(x, ties="weighted") # 5.5 (default)
m3 <- weightedMedian(x, ties="min") # 5
m4 <- weightedMedian(x, ties="max") # 6
stopifnot(identical(m1,m2))
# Pull the median towards zero
w[1] <- 5
m1 <- weightedMedian(x, w) # 3.5
y <- c(rep(0,w[1]), x[-1]) # Only possible for integer weights
m2 <- median(y) # 3.5
stopifnot(identical(m1,m2))
# Put even more weight on the zero
w[1] <- 8.5
weightedMedian(x, w) # 2
# All weight on the first value
w[1] <- Inf
weightedMedian(x, w) # 1
# All weight on the last value
w[1] <- 1
w[n] <- Inf
weightedMedian(x, w) # 10
# All weights set to zero
w <- rep(0, n)
weightedMedian(x, w) # NA
# Simple benchmarking
bench <- function(N=1e5, K=10) {
x <- rnorm(N)
t <- c()
t[1] <- system.time(for (k in 1:K) median(x))[3]
t[2] <- system.time(for (k in 1:K) weightedMedian(x, method="quick"))[3]
t[3] <- system.time(for (k in 1:K) weightedMedian(x, method="shell"))[3]
t <- t / t[1]
t[4] <- t[2]/t[3]
names(t) <- c("median", "wMed-quick", "wMed-shell", "quick/shell")
t
}
print(bench(N= 5, K=500))
print(bench(N= 50, K=500))
print(bench(N=200, K=200))
print(bench(N=1e5, K=5))
``` |

matrixStats documentation built on May 31, 2017, 2:26 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.