hdepth: Halfspace depth of points relative to a dataset

View source: R/hdepth.R

hdepthR Documentation

Halfspace depth of points relative to a dataset

Description

Computes the halfspace depth of p-dimensional points z relative to a p-dimensional dataset x. Computation is exact for p \le 3 and approximate when p > 3. For the approximate algorithm the halfspace depth is computed as the minimal univariate halfspace depth over many directions. To obtain the univariate halfspace depth in the direction v, the dataset x is projected on v, and the univariate location depth of the points of v'z_i to xv is computed.

Usage

hdepth(x, z = NULL, options = list())

Arguments

x

An n by p data matrix with observations in the rows and variables in the columns.

z

An optional m by p matrix containing rowwise the points z_i for which to compute the halfspace depth. If z is not specified, it is set equal to x.

options

A list of available options:

  • type
    Determines the desired type of invariance for the approximate algorithm and should be one of "Affine", "Rotation" or "Shift". When the option "Affine" is used, the directions v are orthogonal to hyperplanes spanned by p observations from x. When the option "Rotation" is used, the directions pass by two randomly selected observations from x. With the option "Shift", directions are randomly generated.
    Defaults to "Affine".

  • ndir
    Determines the number of directions v by setting ndir to a specific number or to "all". In the latter case, an exhaustive search over all possible directions (according to type) is performed. If ndir is larger than the number of possible directions, the algorithm will automatically use this setting.
    Defaults to 250p when type="Affine", to 5000 when type="Rotation" and to 12500 when type="Shift".

  • approx
    The user may force approximate calculation in two or three dimensions by setting this option to TRUE.
    Defaults to FALSE.

  • seed
    A strictly positive integer specifying the seed to be used by the C++ code.
    Defaults to 10.

Details

Halfspace depth has been introduced by Tukey (1975). The halfspace depth of a point z_i is defined as the minimal number of observations from x that are contained in any closed halfspace with boundary through z_i.

In dimensions p=2 and p=3 the computations are by default carried out exactly using the algorithms described in Rousseeuw and Ruts (1996) and Rousseeuw and Struyf (1998). This yields an affine invariant measure of depth. Approximate algorithms are also implemented which are affine, rotation or shift invariant, depending on the value chosen for type. They can be used in any dimension. The shift invariant algorithm coincides with the random Tukey depth (Cuesta-Albertos and Nieto-Reyes, 2008).

The resulting halfspace depth values are invariant to affine transformations when the exact algorithm is used and invariant to affine transformations, rotations and shifts depending on the choice for type, provided that the seed is kept fixed at different runs of the algorithm. Note that the halfspace depth values values are guaranteed to decrease when more directions are considered, provided the seed is kept fixed, as this ensures that the random directions are generated in a fixed order.

If the halfspace depth needs to be computed for m points z_i, it is recommended to apply the function once with the matrix z as input, instead of applying it m times with input vectors z_i, as numerous computations can be saved. The approximate algorithms automatically then also compute the depth values of the observations in x. When only the halfspace depth of the observations in x is required, the call to the function should be hdepth(x) or equivalently hdepth(x,x). In that case the depth values will be stored in the 'depthZ' output field. For bivariate data these will be the exact values by default.

To visualize the depth of bivariate data one can apply the mrainbowplot function. It plots the data colored according to their depth.

It is first checked whether the data lie in a subspace of dimension smaller than p. If so, a warning is given, as well as the dimension of the subspace and a direction which is orthogonal to it.

Value

A list with components:

depthX

Vector of length n giving the halfspace depth of the observations in x.
By default exact if p \le 3 and approximate if p > 3 or the option approx is set to TRUE.

depthZ

Vector of length m giving the halfspace depth of the points in z relative to x.
By default exact if p \le 3 and approximate if p > 3 or the option approx is set to TRUE.

singularSubsets

When the input parameter type is equal to "Affine", the number of p-subsets that span a subspace of dimension smaller than p-1. In that case the orthogonal direction can not be uniquely determined. This is an indication that the data are not in general position. When the input parameter type is equal to "Rotation" it is possible that two randomly selected points of the data coincide due to ties in the data. In this case this value signals how many times this happens.

dimension

When the data x are lying in a lower dimensional subspace, the dimension of this subspace.

hyperplane

When the data x are lying in a lower dimensional subspace, a direction orthogonal to this subspace.

Author(s)

P. Segaert based on Fortran code by P.J. Rousseeuw, I. Ruts and A. Struyf, and C++ code by P. Segaert and K. Vakili.

References

Tukey J. (1975). Mathematics and the picturing of data. Proceedings of the International Congress of Mathematicians, 2, 523–531, Vancouver.

Rousseeuw P.J., Ruts I. (1996). AS 307: Bivariate location depth. Journal of the Royal Statistical Society: Series C, 45, 516–526.

Rousseeuw P.J., Struyf A. (1998). Computing location depth and regression depth in higher dimensions. Statistics and Computing, 8, 193–203.

Cuesta-Albertos J., Nieto-Reyes A. (2008). The random Tukey depth. Computational Statistics & Data Analysis, 52, 4979–4988.

See Also

hdepthmedian, mrainbowplot, bagdistance, bagplot

Examples

# Compute the halfspace depth of a simple
# two-dimensional dataset. 
data(cardata90)
Result <- hdepth(x = cardata90)
mrainbowplot(cardata90, depths = Result$depthZ)

# In two dimensions we may also opt to use the
# approximate algorithm. The number of directions
# may be specified through the option list.
options <- list(type = "Rotation",
                ndir = 750,
                approx = TRUE)
Result <- hdepth(x = cardata90, options = options)
# The resulting halfspace depth is monotone decreasing 
# in the number of directions.
options <- list(type = "Rotation",
                ndir = 10,
                approx = TRUE)
Result1 <- hdepth(x = cardata90, options = options)
options <- list(type = "Rotation",
                ndir = 500,
                approx = TRUE)
Result2 <- hdepth(x = cardata90, options = options)
which(Result1$depthZ - Result2$depthZ < 0)
# This is however not the case when the seed is changed
options <- list(type = "Rotation",
                ndir = 10,
                approx = TRUE)
Result1 <- hdepth(x = cardata90, options = options)
options <- list(type = "Rotation",
                ndir = 50,
                approx = TRUE,
                seed = 897)
Result2 <- hdepth(x = cardata90, options = options)
which(Result1$depthZ - Result2$depthZ < 0)
plot(Result1$depthZ - Result2$depthZ,
     xlab = "Index", ylab = "Difference in halfspace depth")

# We can also consider directions through two data
# points. If the sample is small enough one may opt
# to search over all choose(n,2) directions.
# Note that the computational load increases substantially
# as n becomes larger.
options <- list(type = "Rotation",
                ndir = "all",
                approx = TRUE)
Result1 <- hdepth(x = cardata90, options = options)

# Alternatively one may consider randomly generated directions.
options <- list(type = "Shift",
                ndir = 250,
                approx = TRUE)
Result1 <- hdepth(x = cardata90, options = options)

mrfDepth documentation built on May 29, 2024, 5:04 a.m.