Description Usage Arguments Details Value Author(s) References See Also Examples
Computes the StahelDonoho outlyingness (SDO) of pdimensional points z
relative to a pdimensional dataset x
. For each multivariate point z_i, its outlyingness relative to x
is defined as its maximal univariate StahelDonoho outlyingness measured over all directions. To obtain the univariate StahelDonoho outlyingness in the direction v, the dataset x
is projected on v, and the robustly standardized distance of v'z_i to the robust center of the projected data points x
v is computed.
1  outlyingness(x, z = NULL, options = list())

x 
An n by p data matrix. 
z 
An optional m by p matrix containing
rowwise the points z_i for which to compute the outlyingness.
If 
options 
A list of available options:

The StahelDonoho outlyingness has been introduced by Stahel (1981) and Donoho (1982). It is mostly suited to measure the degree of outlyingness of multivariate points with respect to a data cloud from an elliptical distribution.
Depending on the dimension p, different approximate algorithms are implemented. The affine invariant algorithm can only be used when n > p. It draws ndir
times at random p observations from x
and considers the direction orthogonal to the hyperplane spanned by these p observations. At most p out of n directions can be considered. The orthogonal invariant version can be applied to highdimensional data. It draws ndir
times at random 2 observations from x
and considers the direction through these observations. Here, at most 2 out of n directions can be considered. Finally, the shift invariant version randomly draws ndir
vectors from the unit sphere.
The resulting StahelDonoho outlyingness values are invariant to affine transformations, rotations and shifts respectively provided that the seed
is kept fixed at different runs of the algorithm. Note that the SDO values are guaranteed to increase when more directions are considered provided the seed is kept fixed, as this ensures that the random directions are generated in a fixed order.
An observation from x
and z
is flagged as an outlier if its SDO exceeds a cutoff value. This cutoff value is determined using the procedure in Rousseeuw et al. (2018). First, the logarithm of the SDO values is taken to render their distribution more symmetric, after which a normal approximation yields a cutoff on these values. The cutoff is then transformed back by applying the exponential function.
It is first checked whether the data lie in a subspace of dimension smaller than p. If so, a warning is given, as well as the dimension of the subspace and a direction which is orthogonal to it. Moreover, from the definition of the StahelDonoho outlyingness it follows that the outlyingness is illdefined when the robust scale of the data projected on the direction v equals zero. In this case the algorithm will stop and give a warning. The returned values then include the direction v as well as an indicator specifying which of the observations of x
belong to the hyperplane orthogonal to v.
A list with components:
outlyingnessX 
Vector of length n giving the outlyingness of the observations in 
outlyingnessZ 
Vector of length m giving the outlyingness of the points in 
cutoff 
Points whose outlyingness exceeds this cutoff can be considered as outliers with respect to 
flagX 
Observations of 
flagZ 
Points of 
singularSubsets 
When the input parameter type is equal to 
dimension 
When the data 
hyperplane 
When the data 
inSubspace 
When a direction v is found such that the robust scale of xv is zero, the observations from 
P. Segaert using C++
code by K. Vakili and P. Segaert.
Stahel W.A. (1981). Robuste Schatzungen: infinitesimale Optimalitat und Schatzungen von Kovarianzmatrizen. PhD Thesis, ETH Zurich.
Donoho D.L. (1982). Breakdown properties of multivariate location estimators. Ph.D. Qualifying paper, Dept. Statistics, Harvard University, Boston.
Maronna R.A., Yohai V. (1995). The behavior of the StahelDonoho robust multivariate estimator. Journal of the American Statistical Association, 90, 330–341.
Rousseeuw, P.J., Raymaekers, J., Hubert, M., (2018), A Measure of Directional Outlyingness with Applications to Image Data and Video. Journal of Computational and Graphical Statistics, 27, 345–359.
projdepth
, projmedian
, adjOutl
, dirOutl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69  # Compute the outlyingness of a simple twodimensional dataset.
# Outliers are plotted in red.
if (requireNamespace("robustbase", quietly = TRUE)) {
BivData < log(robustbase::Animals2)
} else {
BivData < matrix(rnorm(120), ncol = 2)
BivData < rbind(BivData, matrix(c(6,6, 6, 2), ncol = 2))
}
Result < outlyingness(x = BivData)
IndOutliers < which(!Result$flagX)
plot(BivData)
points(BivData[IndOutliers,], col = "red")
# The number of directions may be specified through
# the option list. The resulting outlyingness is
# monotone increasing in the number of directions.
Result1 < outlyingness(x = BivData,
options = list(ndir = 50)
)
Result2 < outlyingness(x = BivData,
options = list(ndir = 100)
)
which(Result2$outlyingnessX  Result1$outlyingnessX < 0)
# This is however not the case when the seed is changed
Result1 < outlyingness(x = BivData,
options = list(ndir = 50)
)
Result2 < outlyingness(x = BivData,
options = list(ndir = 100,
seed = 950)
)
plot(Result2$outlyingnessX  Result1$outlyingnessX,
xlab = "Index", ylab = "Difference in outlyingness")
# We can also consider directions through two data
# points. If the sample is small enough one may opt
# to search over all choose(n,2) directions.
# Note that the computational load increases dramatically
# as n becomes larger.
Result < outlyingness(x = BivData,
options = list(type = "Rotation",
ndir = "all")
)
IndOutliers < which(!Result$flagX)
plot(BivData)
points(BivData[IndOutliers,], col = "red")
# Alternatively one may consider randomly generated directions.
Result < outlyingness(x = BivData,
options = list(type = "Shift",
ndir = 1000)
)
IndOutliers < which(!Result$flagX)
plot(BivData)
points(BivData[IndOutliers,], col = "red")
# The default option of using the MAD for the scale may be
# changed to using the univariate mcd.
Result < outlyingness(x = BivData,
options = list(type = "Affine",
ndir = 1000,
stand = "MedMad",
h = nrow(BivData))
)
IndOutliers < which(!Result$flagX)
plot(BivData)
points(BivData[IndOutliers,], col = "red")

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.