outstah: Outlyingness measures

View source: R/outstah.R

outstahR Documentation

Outlyingness measures

Description

Functions calculating outlyingness for the data observations (= rows of a matrix X). Outlyingness quantifies (in a relative scale) how far is an observation from the bulk (center) of the data. Such a measure can be used for detecting outliers and/or weighting the row observatiosn in robust regression methods.

- outstah:

The function computes the Stahel-Donoho outlyingness (Maronna & Yohai 1995, Hubert et al. 2005, Daszykowski et al. 2007).

Outlyingness is calculated from the projections of the observation to a set of directions. The set of directions consists in the n directions corresponding to the rows of X, eventually completed by a number of nsim directions. In the function, the nsim directions are simulated as proposed in Hubert et al. (2005): random couples of observations are sampled in matrix X and, for each couple, the simulated direction is the one passing through the two observations of the couple (see functions .simpp.hub in file zfunctions.R).

- outeucl:

Outlyingness is calculated by the Euclidean distance between the observation and a robust estimate of the center of the data (either the column-wise median or the spatial median). The euclidean distance is then scaled by the median of the n calculated Euclidean distances. Such outlyingness was for instance used in the robust PLSR (PRM) algorithm of Serneels et al. 2005.

- outsdod:

Outlyingness is calculated from a fitted score space. For instance, a PCA (or PLS) is preliminary fitted with a given algorithm (ideally robust). Then, score (SD) and orthogonal (OD) distances are calculated for the fitted score space and standardized by cutoffs (see scordis and odis). The outlyingness is then computed by sqrt(.5 * SD_stand^2 + .5 * OD_stand^2).

Usage


outstah(X, scale = TRUE, nsim = 1500)

outeucl(X, scale = TRUE, spatial = FALSE)

outsdod(fm, X,
    ncomp = NULL, 
    robust = FALSE, alpha = .01)

Arguments

X

A n x p matrix or data frame of variables.

scale

If TRUE (default), for outstah, matrix X is preliminary centered and scaled by column-wise medians and MADs, respectively, and for outeucl, matrix X is preliminary scaled by column-wise MADs.

nsim

For outstah. A number of randomly simulated directions considered in addition to the n data observations (rows of X).

spatial

For outeucl. Logical. If TRUE, the center of the data is calculated by the spatial median (computed by function .xmedspa available in file zfunctions.R; this function uses the fast code of rrcov v.1.4-3 available on R CRAN; Thanks to V. Todorov, 2016). If FALSE (default), the center of the data is calculated by the column-wise medians.

fm

For outsdod. The PCA model from wich are calculated the outlyingness.

ncomp

For outsdod. See scordis and odis.

robust

For outsdod. See scordis and odis.

alpha

For outsdod. See scordis and odis.

Value

A vector of outlyningness (length n).

References

Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., Walczak, B., 2007. Robust statistics in data analysis - A review. Chemometrics and Intelligent Laboratory Systems 85, 203-219. https://doi.org/10.1016/j.chemolab.2006.06.016

Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C., 2015. Sparse partial robust M regression. Chemometrics and Intelligent Laboratory Systems 149, 50-59. https://doi.org/10.1016/j.chemolab.2015.09.019

Hubert, M., Rousseeuw, P.J., Vanden Branden, K., 2005. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 47, 64-79. https://doi.org/10.1198/004017004000000563

Maronna, R.A., Yohai, V.J., 1995. The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90, 330-341. https://doi.org/10.1080/01621459.1995.10476517

Serneels, S., Croux, C., Filzmoser, P., Van Espen, P.J., 2005. Partial robust M-regression. Chemometrics and Intelligent Laboratory Systems 79, 55-64. https://doi.org/10.1016/j.chemolab.2005.04.007

Examples


n <- 6
p <- 4
set.seed(1)
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
set.seed(NULL)
X

outstah(X)
outeucl(X)

fm <- pca_rob(X, ncomp = 2)
outsdod(fm, X)


mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.