med: Multivariate median

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/med.R

Description

Computes the median of a multivariate data set.

Usage

1
2
3
4
med(x, method = "Tukey", approx = FALSE, eps = 1e-8, maxit = 200,
   mustdith = FALSE, maxdith = 50, dithfactor = 10, factor = 0.8, 
   nstp = NULL, ntry = NULL, nalt = NULL, 
   ndir = 1000)

Arguments

x

The data as a matrix, data frame or list. If it is a matrix or data frame, then each row is viewed as one multivariate observation. If it is a list, all components must be numerical vectors of equal length (coordinates of observations).

method

Character string which determines the depth function used. method can be "Tukey" (the default), "Liu", "Oja", "Spatial" or "CWmed".

approx

Logical. Should an approximate Tukey median be computed? Useful in dimension 2 only when sample size is large.

eps

Error tolerance to control the calculation.

maxit

Number of Newton-Raphson iterations in case method is "Spatial".

mustdith

Logical.Should dithering be applied? Used to compute the Tukey median when data set is not in general position or a numerical problem is encountered.

maxdith

Integer. Maximum number of dithering steps.

dithfactor

Scaling factor used for horizontal and vertical dithering.

factor

Proportion (0 to 1) of outermost contours computed according to algorithm HALFMED of Rousseeuw and Ruts (1998); remaining contours derived from an algorithm in Rousseeuw et al. (1999).

nstp

Positive integer. Maximum number of steps in the iteration process leading to an approximate value of the Tukey median. If NULL, the default value is taken to be the largest integer not greater than 5 n^{0.3}p, where n is the number of observations and p the dimension.

ntry

Positive integer. Maximum number of steps without an increase of the Tukey depth in the iteration process leading to an approximate value of the Tukey median. If NULL, the default value is taken to be 10(p+1), where p is the dimension.

nalt

Positive integer. Maximum number of consecutive steps without an increase of the Tukey depth at any time in the iteration process leading to an approximate value of the Tukey median. If NULL, the default value is taken to be 4(p+1), where p is the dimension.

ndir

Positive integer. Number of random directions used in the iteration process leading to an approximate value of the Tukey median.

Details

method "Tukey" computes the Tukey median. Calculation is exact in dimensions 1 and 2, and approximate in higher dimensions. The bivariate case utilises algorithm HALFMED by Rousseeuw and Ruts (1998) as well as an algorithm from Rousseeuw et al. (1999). Argument factor determines which algorithm to use. If n is the number of observations, contours of depth factor n/2 are derived from algorithm HALFMED, while the remaining contours are obtained from the second algorithm. The higher dimensional case is covered by Fortran code from Struyf and Rousseeuw (2000).

When method is "Tukey", data must be in general position. If not, in dimension 2 dithering can be used in the sense that random noise is added to each component of each observation. Random noise takes the form eps times dithfactor times U for the horizontal component and eps times dithfactor times V for the vertical component, where U, V are independent uniform on [-.5, 5.]. This is done in a number of consecutive steps applying independent U's and V's.

method "Liu" computes the Liu median. It is based on Fortran code from Rousseeuw and Ruts (1996) and restricted to two-dimensional data.

method "Oja" computes the Oja median. It is based on Fortran code by Niinimaa et al. (1992) and restricted to two-dimensional data.

method "Spatial" computes the spatial median or mediancentre. It is based on Fortran code by Gower (1974), and Bedall and Zimmermann (1979).

method "CWmed" computes the coordinatewise median.

Value

A list with components

median

the median

depth

the depth of the median (omitted when method is "Spatial" or "CWmed")

Author(s)

Jean-Claude Masse and Jean-Francois Plante, based on Fortran code by authors listed in the references.

References

Gower, J.C. (1974), AS 78: The Mediancentre, Appl. Stat., 23, 466–470.

Bedall, F.K. and Zimmermann, H. (1979), AS 143: The Mediancentre, Appl. Stat., 28, 325–328.

Niinimaa, A, Oja, H., Nyblom, J. (1992), AS 277 : The Oja Bivariate Median, Appl. Stat., 41, 611–617.

Rousseeuw, P.J. and Ruts, I. (1996), Algorithm AS 307: Bivariate location depth, Appl. Stat.-J. Roy. St. C, 45, 516–526.

Rousseeuw, P.J. and Ruts, I. (1998), Constructing the bivariate Tukey median, Stat. Sinica, 8, 828–839.

Rousseeuw, P.J., Ruts, I., and Tukey, J.W. (1999), The Bagplot: A Bivariate Boxplot, The Am. Stat., 53, 382–387.

Small, C.G. (1990), A survey of multidimensional medians, Int. Statist. Rev., 58, 263–277.

Struyf, A. and Rousseeuw, P.J. (2000), High-dimensional computation of the deepest location, Comput. Statist. Data Anal., 34, 415–436.

Masse, J.C and Plante, J.F. (2003), A Monte Carlo study of the accuracy and robustness of ten bivariate location estimators, Comput. Statist. Data Anal., 42, 1–26.

See Also

trmean and ctrmean for trimmed means

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## exact Tukey median for a mixture of bivariate normals
set.seed(159); library(MASS)
mu1 <- c(0,0); mu2 <- c(6,0); sigma <- matrix(c(1,0,0,1), nc = 2)
mixbivnorm <- rbind(mvrnorm(80, mu1, sigma), mvrnorm(20, mu2, sigma))
med(mixbivnorm)

##  approximate Tukey median of a four-dimensional data set
set.seed(601)
zz <- matrix(rnorm(96), nc = 4)
med(zz)

## data set not in general position
data(starsCYG, package = "robustbase")
med(starsCYG, method = "Liu")

## use of dithering for the Tukey median
med(starsCYG, mustdith = TRUE)

Example output

Loading required package: abind
Loading required package: circular

Attaching package: 'circular'

The following objects are masked from 'package:stats':

    sd, var

Loading required package: rgl
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE 
3: .onUnload failed in unloadNamespace() for 'rgl', details:
  call: fun(...)
  error: object 'rgl_quit' not found 
$median
[1]  0.5136701 -0.1909707

$depth
[1] 0.42

$median
[1] -0.25355415 -0.08901201 -0.00909997  0.43951226

$depth
[1] 0.3333333

Warning messages:
1: In med(zz) :
  Tukey's median can be calculated exactly on bivariate samples only.
2: In med(zz) : Reach maximum number of iterations: nstep =  51
$median
[1] 4.45 5.22

$depth
[1] 0.3168671

$median
[1] 4.405786 5.014595

$depth
[1] 0.4042553

Warning message:
In med(starsCYG, mustdith = TRUE) :
  Data are not in general position. Dithering was used.

depth documentation built on Nov. 21, 2019, 5:06 p.m.