bagdistance | R Documentation |
Computes the bagdistance of p
-dimensional points z
relative to a p
-dimensional dataset x
. To compute the bagdistance of a point z_i
first the bag of x
is computed as the depth region containing the 50% observations (of x
) with largest halfspace depth. Next, the ray from the halfspace median \theta
through z_i
is considered and c_z
is defined as the intersection of this ray and the boundary of the bag. The bagdistance of z_i
to x
is then given by the ratio between the Euclidean distance of z_i
to the halfspace median and the Euclidean distance of c_z
to the halfspace median.
bagdistance(x, z = NULL, options = list())
x |
An |
z |
An optional |
options |
A list of available options:
|
The bagdistance has been introduced in Hubert et al. (2015) and studied in Hubert et al. (2017). It does not assume symmetry and is affine invariant. Note that when the halfspace is not computed in an affine invariant way, the bagdistance cannot be affine invariant either.
The function first computes the halfspace depth and the halfspace median of x
. Additional options may be passed to the hdepth
routine by specifying them in the option
list argument.
It is first checked whether the data lie in a subspace of dimension smaller than p
. If so, a warning is given, as well as the dimension of the subspace and a direction which is orthogonal to it.
Depending on the dimensions different algorithms are used. For p=1
the bagdistance is computed exactly. For p=2
the default setting (options$approx=TRUE
) uses an approximated algorithm. Exact computation, based on the exact algoritm to compute the contours of the bag (see the depthContour
function), is obtained by setting options$approx
to FALSE. Note that this may lead to an increase in computation time.
For the approximated algorithm, the intersection point c_z
is approximated by searching on each ray the point whose depth is equal to the median of the depth values of x
. As the halfspace depth is monotone decreasing along the ray, a bisection algorithm is used. Starting limits are obtained by projecting the data on the direction and considering the data point with univariate depth corresponding to the median of the halfspace depths of x
. By definition the multivariate depth of this point has to be lower or equal than its univariate depth. A second limit is obtained by considering the deepest location estimate. The maximum number of iterations bisecting the current search interval can be specified through the options argument max.iter
.
An observation from z
is flagged as an outlier if its bagdistance exceeds a cutoff value. This cutoff is equal to the squareroot of the 0.99 quantile of the chi-squared distribution with p
degrees of freedom.
A list with components:
bagdistance |
The bagdistance of the points of |
cutoff |
Points of |
flag |
Points of |
converged |
Vector of length |
dimension |
When the data |
hyperplane |
When the data |
P. Segaert.
Hubert M., Rousseeuw P.J., Segaert P. (2015). Multivariate functional outlier detection. Statistical Methods & Applications, 24, 177–202.
Hubert M., Rousseeuw P.J., Segaert P. (2017). Multivariate and functional classification using depth and distance. Advances in Data Analysis and Classification, 11, 445–466.
depthContour
, hdepth
, bagplot
# Generate some bivariate data
set.seed(5)
nObs <- 500
XS <- matrix(rnorm(nObs * 2), nrow = nObs, ncol = 2)
A <- matrix(c(1,1,.5,.1), ncol = 2, nrow = 2)
X <- XS %*% A
# In two dimensions we may either use the approximate
# or the exact algorithm to compute the bag.
respons.exact <- bagdistance(x = X, options = list(approx = FALSE))
respons.approx <- bagdistance(x = X, options = list(approx = TRUE))
# Both algorithms yield fairly similar results.
plot(respons.exact$bagdistance, respons.approx$bagdistance)
abline(a = 0, b = 1)
# In Hubert et al. (2015) it was shown that for elliptical
# distributions the squared bagdistance relates to the
# squared Mahalanobis distances. This may be easily illustrated.
mahDist <- mahalanobis(x = X, colMeans(X), cov(X))
plot(respons.exact$bagdistance^2, mahDist)
# Computation of the bagdistance relies on the computation
# of halfspace depth using the hdepth function. Options for
# the hdepth routine can be passed down using the options
# arguments. Note that the bagdistance is only affine invariant
# if the halfspace depth is computed in an affine invariant way.
options <-list(type = "Rotation",
ndir = 375,
approx = TRUE,
seed = 78341)
respons.approx.rot <- bagdistance(x = X, options = options)
plot(respons.exact$bagdistance, respons.approx.rot$bagdistance)
abline(a = 0, b = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.